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ABSTRACT 


Syntactic  pattern  recognition  has  been  applied  to  seismic 
classification  in  this  study.  Its  performance  is  better  than  many  exist¬ 
ing  statistical  approaches.  VLSI  architectures  for  syntactic  seismic 
recognition  are  also  proposed  which  take  advantage  of  parallel  process¬ 
ing  and  pipelining  so  that  a  constant  time  complexity  is  attainable  when 
processing  large  amount  of  data.  Application  of  syntactic  pattern 
recognition  to  damage  assesment  is  also  proposed  and  demonstrated  on 
a  set  of  experimental  data. 

Seismic  waveforms  are  represented  by  strings  of  primitives,  i.e., 
sentences,  in  this  study.  String-to-string  similarity  measures  based  on 
both  distance  and  likelihood  concepts  are  discussed  along  with  the 
symmetric  property  and  the  hierarchy.  A  fixed-length  segmentation  is 
used  in  the  experiment.  Encouraging  results  comparable  to  those  of 
the  best  statistical  approaches  are  obtained  with  only  two  very  simple 
features,  namely,  zero-crossing  count  and  log  energy.  Primitives  are 
automatically  selected  using  a  hierarchical  clustering  procedure  and 
two  decision  criteria. 


Nearest-neighbor  decision  rule  and  finite-state  error-correcting 
parsers  are  used  for  classification.  For  error-correcting  parsing, 
finite-state  grammars  are  first  inferred  from  the  training  samples. 
These  two  approaches  have  same  performance  in  the  experiment, 
whereas  the  nearest-neighbor  rule  is  faster  in  speed. 

Attributed  grammar  and  its  parsing  are  also  proposed  for  seismic 
recognition,  which  could  reduce  the  complexity  and  increase  the 
descriptive  flexibility  of  the  pattern  grammars.  VLSI  architectures  are 
proposed  for  fast  recognition  of  seismic  waveforms.  Three  systolic 
arrays  perform  the  feature  selection,  primitive  recognition  and  string 
distance  computation.  These  individual  units  can  be  used  in  other  simi¬ 
lar  applications. 

Although  this  study  is  on  seismic  classification,  it  can  be  extended 
or  modified  to  tackle  other  signal  recognition  problems. 
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CHAPTER  I 

INTRODUCTION 


1.1  Statement  of  the  Problem 

In  the  past,  seismic  wave  analyses  were  all  retained  within  the  geo¬ 
physical  field.  Underground  structure  and  earthquake  analyses  are  the 
most  important  topics.  The  major  parameters  computed  from  the 
recorded  seismograms  are  the  location,  time,  depth  and  magnitude  of 
the  event  and  so  forth. 

In  the  1960’s,  a  new  problem  arose  when  the  idea  of  the 
comprehensive  nuclear  test  ban  treaties  were  proposed.  The  problem  is 
how  to  discriminate  between  the  natural  earthquake  and  the  secret 
underground  nuclear  explosion  by  seismological  methods,  which  in  turn 
are  based  on  the  seismic  wave  recordings  (Bolt,  1976;  Dahlman  and 
Israelson,  1977).  Traditional  methods  use  the  informations  like  time, 
location,  depth,  magnitude,  complexity,  ratio  of  body  wave  magnitude 
to  surface  wave  magnitude  and  usually  interaction  of  human  experts. 
However,  these  methods  are  not  reliable  for  small  events  and  require 
the  involvement  of  many  seismic  stations.  Recently,  pattern  recogni¬ 
tion  has  been  applied  to  the  discrimination  between  these  two 
categories  (see  Chen,  1978). 


It  is  sometimes  very  difficult  to  distinguish  between  some  earth¬ 
quakes  and  explosions  just  by  looking  at  the  seismic  signals  only.  Even 
for  experienced  analyst  additional  informations  are  needed  in  order  to 
make  correct  classification.  According  to  the  source  mechanism,  the 
explosion  signal  should  look  more  like  pulse  and  contain  higher  fre¬ 
quency  than  earthquake,  while  the  earthquake  signal  should  last  longer 
and  look  more  complex.  However  it  is  not  always  true  since  the  depth 
of  the  source,  distance  and  geophysical  configuration  of  the  path  will 
change  the  waveform  significantly.  Here  are  some  examples.  The 
difference  between  explosion  and  earthquake  is  very  clear  in  Figure  1.1, 
but  not  so  in  Figure  1.2  and  Figure  1.3  where  neither  frequency  nor 
complexity  can  tell  the  difference.  In  pattern  recognition  terminology 
these  two  classes  are  overlapped. 

All  the  existing  pattern  recognition  applications  use  statistical 
approach.  Since  the  complexity  and  structural  information  play  an 
important  role  in  seismic  analysis,  it  is  thus  natural  to  pursue  syntactic 
(structural)  approach  in  seismic  pattern  analysis.  In  oil  exploration, 
the  structure  of  the  seismic  reflection  indicates  the  underground  struc¬ 
ture.  In  earthquake  /  explosion  classification,  the  structural  informa¬ 
tion  is  the  most  important  feature.  The  block  diagram  of  a  typical  syn¬ 
tactic  pattern  recognition  system  is  shown  in  Figure  1.4.  Due  to  the 
unknown  characteristic  about  the  source  and  environment,  seismic 
grammar  is  usually  difficult  to  construct  manually.  Therefore,  gram¬ 
matical  inference  techniques  will  be  applied  to  infer  the  pattern  gram¬ 
mar  from  a  set  of  training  samples.  An  error-correcting  parser  will  also 
be  used  because  the  chance  that  a  testing  sample  is  perfectly  accepted 
by  the  inferred  grammar  is  very  slim.  This  is  usually  a  rule  rather  than 


Figure  1.3  Anotl 
earthquake  wave 
an  earthquake. 


Figure  1.4  Block  diagram  of  a  syntactic  pattern  recognition  system. 
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an  exception  in  many  practical  applications,  and  seismic  analysis  hap¬ 
pens  to  be  one  of  them.  This  is  due  to  the  noise  and  uncertainty  of  the 
source  and  background.  In  addition  to  grammatical  approach,  we  will 
also  use  nearest-neighbor  decision  rule  for  classification.  Of  course,  the 
distance,  or  similarity,  computation  is  between  the  string  representa¬ 
tion  of  the  seismic  signals.  The  block  diagram  of  nearest-neighbor 
classifier  for  syntactic  patterns  is  shown  in  Figure  1.5. 

Due  to  the  recent  advance  of  VLSI  technology  it  is  now  feasible  and 
will  soon  become  economical  to  design  custom  chips  for  special  applica¬ 
tions  (Mead  and  Conway,  1980;  Kung,  1979;  Ackland,  et  al.,  1981).  A  VLSI 
system  for  seismic  signal  recognition  will  also  be  developed  in  this 
study. 


1.2  literature  Survey 


1.2.1  Sy ntac  tic  Patte  rn  Re  c  og  nition  and 
Digital  Signal  Processing 

Applications  of  syntactic  pattern  recognition  to  digital  signal  pro¬ 
cessing  have  received  much  attention  and  achieved  considerable  suc¬ 
cess  in  the  past  decade  (see  Fu,  1982).  The  most  prominent  applica¬ 
tions  are  in  the  areas  of  biomedical  waveform  analysis  and  speech 
recognition.  The  reason  of  their  success  is  that  these  waveforms  have 
regular  and  predictable  structure.  Most  biomedical  waveforms,  e.g., 
ECG  wave  and  carotid  pulse  wave,  are  rhythmic  and  generated  by 
specific  organs  of  the  body  where  their  functions  are  well  understood. 


Figure  1.5  Block  diagram  of  a  syntactic  pattern  recognition  system 
using  the  nearest-neighbor  decision  rule  for  string  patterns. 
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It  is  thus  easy  to  write  a  grammar  for  these  waveforms  based  on  their 
functions.  Horowitz  (1975,  1977)  developed  a  syntactic  algorithm  to 
detect  the  peaks  of  ECG  waves.  Albus  (1977)  used  a  stochastic  finite- 
state  model  to  interpret  ECG  signals.  Giese,  et  al.,  (1979)  proposed  a 
syntactic  method  to  analyze  EEG  signals.  Stockman,  et  al.,  (1976) 
applied  a  syntactic  method  to  analyze  carotid  pulse  waveforms.  The 
major  problem  in  biomedical  waveform  analysis  is  the  noise  which  could 
be  generated  by  muscles  or  other  sources  (Albus,  1977). 

It  has  been  shown  that  speech  patterns  are  related  to  linguistic 
items  by  a  complex  set  of  rules  belonging  to  "grammar  of  speech" 
(DeMori,  1977).  Therefore,  the  most  effective  way  of  detecting  and 
recognizing  speech  patterns  is  by  syntactic  method.  DeMori  (1972)  has 
shown  a  syntactic  method  to  recognize  spoken  Italian  digits.  The  major 
problem  in  speech  recognition  is  the  variability  of  the  speech  patterns. 
They  are  speaker-dependent  as  well  as  context-dependent.  Even  for  the 
same  speaker  and  the  same  word,  the  features  extracted  from  different 
utterances  are  usually  not  the  same. 

We  will  review  in  this  section  some  of  the  existing  syntactic 
methods  applied  to  signal  processing.  Although  preprocessing  is  also 
important,  we  do  not  include  this  part  here,  because  it  is  case  depen¬ 
dent  and  is  usually  not  related  to  the  recognition  stage.  However,  we 
will  discuss  the  preprocessing  procedure  later  in  our  experiments  of 
seismic  signal  recognition.  We  will  now  concentrate  on  the  major  parts 
of  syntactic  pattern  recognition  system,  i.e.,  segmentation,  feature 
extraction,  primitive  selection,  grammatical  inference  or  construction, 
and  syntax  analysis. 
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A  waveform  must  be  converted  into  a  string  of  primitives  (tree  or 
graph  for  high  dimensional  representation)  before  grammatical  infer¬ 
ence  and  syntax  analysis  can  take  place.  Since  a  waveform  is  a  one¬ 
dimensional  signal,  it  is  most  natural  to  represent  it  by  a  string  of 
primitives.  Various  series  expansion,  for  example,  Fourier  series,  and 
spectral  analysis  techniques  have  been  used  to  approximate  the  whole 
waveform.  However,  they  are  not  suitable  for  syntactic  analysis 
because  the  relationships  among  one  part  of  the  waveform  and  the  oth¬ 
ers  are  significant  in  syntactic  analysis.  Although  they  can  be  used  to 
feature  waveform  segment,  they  are  subject  to  the  constraint  of  seg¬ 
ment  length  and  characteristics  of  the  waveform.  Pavlidis  (1971,  1973, 
1974)  proposed  a  linguistic  waveform  analysis  algorithm  in  which  he 
partitioned  the  waveform  into  several  segments  by  using  linear  approxi¬ 
mation.  The  basic  idea  is  to  minimize  the  number  of  segments  by 
merging  and  splitting  while  the  error  norm  of  each  segment  is  retained 
below  the  error  tolerance.  Horowitz  (1975,  1977)  extended  this  idea 
and  added  peak  detection  algorithm.  He  gave  a  syntactic  definition  to 
the  positive  peak  -  a  positive  slope  followed  by  a  negative  slope  or  posi¬ 
tive  slope  followed  by  zero  slope  and  then  followed  by  negative  slope.  A 
negative  peak  can  be  defined  in  a  similar  way.  He  further  constructed  a 
deterministic  context-free  grammar  to  recognize  positive  and  negative 
peaks.  This  approach  is  useful  in  waveform  shape  analysis  because  of 
its  simplicity.  However,  the  curvature  informations  are  not  included. 

Another  interesting  representation  of  waveform  is  by  tree  struc¬ 
ture.  It  was  first  introduced  by  Ehrich  and  Foith  (1976).  The  peaks  and 
valleys  of  the  waveform  are  detected  and  connected  by  a  relational 
tree.  Sankar  and  Rosenfeld  (1979)  extended  this  idea  by  using  the 


concepts  of  fuzzy  connectedness.  This  method  converts  one¬ 
dimensional  waveform  into  two-dimensional  tree  structure.  It  is  useful 
for  unipolar  waveform  analysis  such  as  terrain  analysis,  but  not  so  help¬ 
ful  for  the  analysis  of  bipolar  waveforms  such  as  ECG  wave  and  random 
waveforms  such  as  EEG  and  seismic  waves.  Another  well-known  method 
of  converting  one-dimensional  signal  into  two-dimensional  image  is 
called  spectrogram  which  is  used  very  often  in  speech  analysis 
(Flanagan,  1972).  The  spectrogram  of  a  waveform  is  the  plot  of  energy 
as  a  function  of  time  and  frequency.  Time  and  frequency  are  the  hor¬ 
izontal  and  vertical  axes  of  the  picture.  Energy  is  represented  by  gray 
level  intensity.  This  method  needs  special  facilities  to  convert  a  small 
segment  of  time-domain  signal  into  frequency-domain  representation 
efficiently.  Automatic  interpretation  of  the  two-dimensional  image  is 
still  a  subject  for  studies. 

Giese  et  al.  (1979)  proposed  a  syntactic  method  to  analyze  EEG  sig¬ 
nal.  The  EEG  recording  is  divided  into  fixed-length  segments,  each  seg¬ 
ment  is  equal  to  1-second  period.  Seventeen  features  are  computed 
from  the  spectral  of  each  segment.  A  linear  classifier  is  applied  to  clas¬ 
sify  the  segments  into  seven  categories.  An  EEG  grammar  is  manually 
constructed  and  a  bottom-up  parser  without  backtracking  is  used  for 
syntax  analysis. 

Stockman  et  al.  (1976)  proposed  a  syntactic  pattern  recognition 
system  for  carotid  pulse  wave  analysis.  A  set  of  thirteen  primitives 
including  various  type  of  line  segments  and  parabolas  are  used.  The 
subpattern  and  primitive  extraction  starts  from  the  most  prominent 
substructure,  e.g.,  long  line  segment,  and  then  less  prominent  struc¬ 
tures  with  respect  to  the  more  prominent  ones,  in  a  prespecified  order. 
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A  context-free  grammar  is  manually  constructed  and  a  top-down  parser 
is  used  for  syntax  analysis. 

De  Mori  (1972,  1977)  proposed  a  syntactic  method  to  recognize  spo¬ 
ken  digits.  First,  each  20-msec  segment  was  sent  to  a  low  pass  filter 
and  a  high  pass  filter,  and  zero-crossing  intervals  obtained  at  the  out¬ 
put  of  the  two  filters  were  classified  into  certain  groups,  i.e.,  eight  for 
LPF  and  five  for  HPF.  Then,  each  spoken  word  is  represented  pictori- 
cally  on  a  two-dimensional  plane.  Finally,  a  context-free  grammar  is 
constructed  and  a  bottom-up  parsing  is  applied.  He  further  introduced 
syntactic  methods  for  preprocessing,  feature  extraction,  emission  and 
^verification  of  hypothesis  and  automatic  learning  of  spectral  features. 

Mottl’  and  Muchnik  (1979)  declared  that  there  are  two  kinds  of 
curve  sources  which  require  the  linguistic  approach  for  analysis.  One 
kind  of  source  is  consistent  with  the  phenomenon  which  is  a  process  of 
many  stages.  The  curve  consists  of  parts  corresponding  to  the  stages. 
The  junction  of  the  parts  are  the  time  when  stages  change.  Th*e  segmen¬ 
tation  algorithm  should  divide  the  curve  into  a  number  of  adjacent 
parts  characterized  by  the  curve  shape.  Examples  of  this  kind  are  ECG 
waveform  and  carotid  pulse  waveform  analysis. 

The  other  kind  of  source  represents  an  object  which  is  chiefly  in  an 
invariable  state  and  occasionally  leaves  as  a  result  of  short-time  distur¬ 
bances.  For  such  a  curve  the  segmentation  should  identify  only  certain 
fragments  which  are  regarded  as  informative  while  the  remainder  are 
left  out.  Example  of  this  kind  is  the  acoustical  diagnosis  of  internal- 
combustion  engines  (Mottl'  and  Muchnik,  1979). 

We  feel  that  seismic  wave  is  the  third  kind  of  curve  which  does  not 
fall  exactly  into  any  of  the  above  two  categories.  The  seismic  waves  are 
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influenced  largely  by  background  as  well  as  by  source.  Sometimes  we 
are  interested  in  the  background,  e.g.,  oil  exploration;  sometimes  we 
are  interested  in  the  source,  e.g.,  nuclear  test  detection.  This  will  be 
discussed  in  the  next  section. 


1.2.2  Pattern  Recognition  and 
Seismic  Signal  Analysis 

The  major  studies  of  seismic  waves  can  be  classified  into  the  follow¬ 
ing  areas  (Bath,  1979): 

1.  Seismic  prospecting.  This  is  the  most  attractive  topic  in  these 
days.  Seismic  methods  are  applied  to  exploration  for  occurrences  of 
oil,  ore  bodies,  minerals,  etc.  The  reflection  method  and  the  refraction 
method  are  two  major  methods  in  use.  It  should  be  noted  that  it  is  not 
possible,  at  least  by  now,  to  detect  oil,  etc.,  by  seismic  or  any  other 
geophysical  methods.  It  is  only  possible  to  discover  geological  forma¬ 
tion  which  may  indicate  the  occurrence  of  oil,  etc. 

2.  Vibration  measurements.  The  effect  of  vibraions,  cue  to  mining, 
traffic,  etc.,  on  various  structures  and  human  beings  is  studied.  Such 
measurements  are  usually  made  with  accelerographs. 

3.  Stress  measurements.  Measurements  of  absolute  stress  have 
been  used  to  investigate  the  strength  of  building  materials  and  stability 
in  mines. 

4.  Earthquake  engineering .  This  field  studies  the  effects  of  earth¬ 
quakes  on  all  kinds  of  building  structures,  especially  on  crucial  struc¬ 
ture  such  as  nuclear  power  plant. 


5.  Earthquake  prediction.  A  very  importat  field  although  no 
significant  progress  has  been  made. 

6.  From  the  recording  of  seismic  waves  to  establish  the  nature  of 
the  source.  For  example: 

a)  Nuclear  test  detection  -  detect  secret  underground  nuclear 
explosion. 

b)  Seismic  detection  of  rockburst  -  locate  small  rupture  by  seismic 
methods. 

Most  of  the  existing  pattern  reconition  applications  in  seismic  ana¬ 
lyses  are  to  the  classification  of  earthquake  and  nuclear  explosion. 
Chen  (1978)  proposed  a  statistical  pattern  recognition  method  for 
classification  of  earthquake  and  nuclear  explosion  by  the  seismic  wave 
recording.  He  emphasized  on  the  extracton  of  effective  features.  Geo¬ 
physical  features  such  as  complexity,  spectral  ratio  and  third  moment 
of  frequency  are  tested  first.  Then  he  used  complex  cepstrum,  orthogo¬ 
nal  transformation,  autocovariance  features  and  short-time  spectral 
features  for  classification.  His  conclusion  is  that  the  performance  from 
a  single  class  of  features  is  somehow  limited  and  the  combination  of 
various  features  does  not  improve  the  performance  because  of  correla¬ 
tion.  He  suggested  to  use  both  statistical  and  structural  features. 

Tjostheim  (1975,  1977,  1978)  suggested  that  autoregressive 

coefficients  can  be  used  as  features.  He  has  shown  that  a  seismic  P- 
wave  can  be  represented  by  an  autoregressive  model  of  finite  order. 
The  short-period  P-wave  is  divided  into  five  segments.  The  first  three 
autoregressive  coefficients  of  each  segment  form  the  feature  vector. 
The  combination  of  different  segments  is  used  to  achieve  better  perfor¬ 
mance.  This  approach  where  the  whole  P-wave  is  divided  into  several 
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segments  is  an  improvement,  but  still  no  structural  information  has 
been  used. 

Sarna  and  Stark  (I960)  also  used  autoregressive  modeling  for 
feature  extraction,  but  k-nearest  neighbor  rule  for  classification.  When 
applied  to  artificial  data,  this  procedure  gave  excellent  results;  how¬ 
ever,  the  results  on  real  seismic  /  explosion  data  are  very  poor.  This 
may  indicate  that  autoregressive  modeling  is  not  suitable  for  real 
seismic  waves.  Most  of  these  studies  concentrated  on  feature 
selection.  Only  simple  decision-theoretic  techniques  have  been  used. 
However,  syntactic  pattern  recognition  appears  to  be  quite  promising  in 
this  area.  It  uses  the  structural  information  of  the  seismic  wave  which 
is  very  important  in  analysis. 

Syntactic  pattern  recognition  has  been  pointed  out  as  a  promising 
approach  to  seismic  classification  (Chen,  1978).  While  quite  a  few 
statistical  approaches  have  been  proposed,  we  are  the  first  to  apply 
syntactic  approaches  to  this  area.  With  only  very  simple  features,  our 
approaches  attain  encouraging  results  comparable  to  those  of  the  best 
statistical  approaches.  Our  approaches  also  differ  from  the  foregoing 
syntactic  methods  in  the  treatment  of  primitive  selection  and  grammar 
construction.  A  clustering  procedure  along  with  some  decision  criteria 
constitute  the  primitive  selection  algorithm  in  our  approach,  while 
heuristic  approaches  were  used  by  others.  Our  pattern  grammars  are 
inferred  from  the  training  samples,  but  most  pattern  grammars  for 
signal  analysis  are  constructed  manually.  An  attributed  grammar  for 
our  specific  application  is  proposed,  which  could  significantly  reduce 
the  grammar  size  and  increase  the  flexibility  of  description.  Finally, 


VLSI  architectures  are  proposed  for  seismic  classification,  which 
include  feature  extraction,  primitive  recognition  and  string  matching. 
Our  string  matcher  is  different  from  many  contemparory 
implementations,  i.e.,  exact  matching,  which  are  not  suitable  for 
pattern  recognition  applications;  the  detail  will  be  discussed  in  chapter 
V.  The  results  can  be  produced  at  a  constant  rate,  i.e.,  constant  time 
complexity,  when  using  our  VLSI  architectures  with  pipelined  data  flow. 
Although  these  VLSI  systems  are  developed  for  seismic  classification, 
they  can  be  applied  to  other  similar  applications. 

After  the  crunch  of  energy  crisis,  searching  for  oil  has  become 
more  desperate  than  ever  before.  Seismological  methods  use  small 
chemical  explosions  to  generate  seismic  waves.  These  seismic  waves 
penetrate  .down  the  crust  and  are  reflected  by  the  boundary  of  different 
layers.  Analysis  based  on  the  reflected  seismic  waves  can  find  the  clue 
about  the  local  crust  structure  and  oil  deposit  (Bath,  1979).  Bois  (1981) 
has  applied  pattern  recognition  technique  to  petroleum  prospection. 

Another  potential  field  for  application  of  pattern  recognition  is  the 
damage  assesment  in  structural  (earthquake)  engineering  (Fu  and  Yao, 
1979;  Yao,  1979).  During  a  strong  earthquake,  the  accelerometers  in  a 
large  building  structure  will  record  the  acceleration  of  the  building. 
From  these  recordings  (and  other  informations)  the  damage  of  the 
building,  as  far  as  the  structural  damage  reflected  on  the  seismic 
recordings  is  concerned,  can  be  classified  into  certain  classes. 


1.3  Organization  of  Thesis 


We  have  seen  some  examples  which  apply  syntactic  approach  to 
digital  signal  analysis.  Those  systems  are  usually  heuristically  con¬ 
structed  and  therefore  application  dependent.  We  also  showed  several 
statistical  pattern  recognition  approaches  to  seismic  classification.  We 
would  like  to  study  in  this  research  the  application  of  syntactic  pattern 
recognition  to  seismic  classification.  Two  approaches  are  investigated: 
one  uses  grammatical  inference  and  error-correcting  parsing;  the  other 
computes  string-to-string  distance  and  applies  nearest-neighbor  deci¬ 
sion  rule.  Chapter  II  discusses  the  string  similarity  measures,  which 
are  hierarchically  classified,  and  classification  procedures  which 
include  error-correcting  parsing  and  nearest-neighbor  decision  rule. 
Chapter  III  shows  procedures  and  experimental  results  of  syntactic  pat¬ 
tern  recognition  applications  to  seismic  discrimination,  i.e.,  earthquake 
/  explosion  classification,  and  damage  assesment.  Attributed  grammar 
which  can  reduce  the  complexity  of  the  pattern  grammar  is  discussed 
in  Chapter  IV.  Chapter  V  discusses  VLSI  architectures  for  syntactic 
seismic  classification,  which  include  feature  extraction,  primitive 
recognition  and  string  matching.  Chapter  VI  is  the  summary,  conclu¬ 
sion  and  recommendations  for  future  research. 

Although  our  experiments  are  on  seismic  discrimination,  these 
approaches  can  be  applied  to  other  similar  applications.  For  example, 
pattern  recognition  method  has  been  used  to  determine  the  nature  of 
reserviors  in  petroleum  prospection  (Bois,  1981).  The  unknown  reser- 
vior  is  compared  with  a  known  reservior,  for  example,  one  which  con¬ 
tains  oil.  Features  are  computed  from  the  seismic  traces  of  the  two 
reserviors  and  ploted  on  a  two-dimensional  plane.  The  similarity 
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between  the  two  reserviors  is  determined  by  the  distribution  of  the  two 
clusters.  Since  the  nature  of  the  reservior  is  characterized  by  the 
seismic  traces,  it  is  possible  to  compare  the  seismic  traces  of  the  two 
reserviors  directly. 

Levenshtein  distance  has  recently  been  applied  to  speech  recogni¬ 
tion  (Okuda,  Tanaka  and  Kasai,  1976;  Ackroyd,  1980).  It  can  be  used  to 
correct  the  letter  or  phoneme  sequences  that  are  generated  by  the 
recognition  machine,  or  can  be  built  directly  into  the  recognition  pro¬ 
cedures.  Our  VLSI  string  matcher  can  be  applied  to  both  cases.  Futh- 
ermore,  our  primitive  recognizer  can  also  be  applied  to  the  case  in  Ack¬ 
royd  (1980).  Mottl’  and  Muchnik  (1979)  proposed  a  linguistic  approach 
to  the  analysis  of  experimental  curves  where  a  special-purpose 
language  is  constructed  to  describe  the  pattern.  The  distance  between 
two  strings  is  defined  as  the  minimum  number  of  insertion  and  deletion 
of  symbols,  which  is  in  essence  equivalent  to  Levenshtein  distance. 
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CHAPTER  II 

SIMILARITY  MEASURES  AND  RECOGNITION 
PROCEDURES  FOR  STRING  PATTERNS 

2. 1  Introduction 

One  important  premise  in  pattern  recognition  is  that  we  can  meas¬ 
ure  the  similarities  between  patterns.  We  say  that  a  pattern  belongs  to 
one  class  if  and  only  if  that  pattern  is  more  similar  to  the  members  of 
this  class  than  the  members  of  other  classes.  These  measures  can  be 
nominal  where  numbers  used  only  as  names,  or  ordinal  where  only  rank 
orders  have  meaning,  or  interval  where  seperation  between  numbers  is 
meaningful,  or  ratios  where  a  natural  zero  exists.  Distemce  is  a  popular 
candidate  for  simlarity  measure.  If  the  pattern  is  represented  by  a 
vector,  as  in  the  case  of  statistical  approach,  the  Euclidean  distance  is 
usually  used  as  a  similarity  measure.  The  Euclidean  distance  has  many 
nice  properties,  for  example,  symmetric  and  invariant  under  transla¬ 
tion  and  rotation. 

In  syntactic  approach,  patterns  are  represented  by  strings,  trees 
or  graphs,  therefore  similarity  measures  must  be  available  for  these 
syntactic  patterns.  Several  similarity  measures  have  been  proposed  to 
tackle  this  problem  (Fu,  1977;  Lu  and  Fu,  1977,  1978b).  Since  our  major 
interest  is  string  patterns,  we  will  review  some  well-known  string  simi¬ 
larity  measures,  discuss  their  properties  and  define  a  hierarchy  of 
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string  distances. 

String  similarity  measure  can  be  applied  to  string-matching  in 
information  storage  and  retrieval  (Hall  and  Dowling,  1980),  speech 
recognition  (Sakoe  and  Chiba,  1978),  clustering  of  string  patterns  (Fu 
and  Lu,  1977)  and  nearest-neighbor  decision  rule  for  string 
classification.  It  is  also  used  in  error-correcting  parsing.  Given  a  string 
y  and  a  language  L(G),  an  error-correcting  parser  (ECP)  generates  a 
parse  for  string  x,  where  x  el(C)  and  x  is  most  similar  to  y. 

Section  2  of  this  chapter  discusses  various  types  of  string  similarity 
measures,  including  both  nonstochastic  and  stochastic  models.  String 
distances  are  classified  into  general  string  distances  and  special  string 
distances.  General  string  distances  are  based  on  the  principles  of 
insertion,  deletion  and  substitution  transformations.  Special  string  dis¬ 
tances  are  those  not  based  on  the  above  principles.  One  example  is  the 
time  warping  distance  in  speech  analysis.  We  propose  another  special 
distance  computation  for  damage  assesment.  A  hierarchy  of  general 
string  distances  are  also  defined.  Section  3  describes  error-correcting 
parsing  algorithms  which  do  not  require  expanded  grammars.  Section 
4  discusses  and  compares  two  recognition  procedures,  namely,  the 
error-correcting  parsing  and  the  nearest-neighbor  rule,  for  syntactic 
patterns,  and  Section  5  gives  the  conclusion. 

This  chapter  emphasizes  the  symmetric  property  of  string  similar¬ 
ity  measures.  This  is  not  a  problem  when  we  use  Euclidean  distance  as 
the  similarity  measure,  since  Euclidean  distance  is  always  symmetric. 
But  this  is  not  true  when  we  define  string  similarity  measures,  espe¬ 
cially  when  using  weighted  distance.  The  error-correcting  parsing  algo¬ 
rithms  using  symmetric  string  similarity  measures  are  also  given  which 


can  not  be  solved  by  any  other  existing  parsing  algorithm. 


2.2  Similarity  Measures  of  Strings 

String  similarity  measures  can  be  defined  in  terms  of  two  different 
concepts,  i.e.,  distance  concept  and  likelihood  concept.  The  former  is 
for  nonstochastic  models  and  the  latter  is  for  stochastic  models.  Con¬ 
sider  string  x  =  ata2  ■  ■  ■  a*  and  string  y  =  b  jb2  • • • bm,  the  string  simi- 
.  larity  measure  between  x  and  y  is  defined  as  the  distance  or  probabil¬ 
ity  that  string  y  is  transformed  from  string  x.  The  distance  or  proba¬ 
bility  of  transformation  from  x  to  y  is  ususlly  different  from  that  of 
transformation  from  y  to  x,  therefore,  results  in  an  asymmetric  simi¬ 
larity  measure,  i.e.,  the  similarity  between  x  and  y  is  different  from  the 
similarity  measure  between  y  and  x .  This  is  a  big  disadvantage  in  some 
applications,  for  example,  in  string  clustering.  The  inconsistency  in 
similarity  measures  makes  the  outcome  inconsistent.  Therefore  we 
want  to  discuss  the  symmetric  property  of  the  string  similarity  meas¬ 
ure. 


2.2.1  Similarity  Measures  based 
on  Distance  Concept 

The  distance  measures  between  strings  have  been  proposed  for 
more  than  one  decade  and  appeared  often  in  the  literature  (see  Fu, 
1982).  It  is  known  (Okuda,  et  al.,  1976)  that  Weighted  Levenshtein  Dis¬ 
tance  (WLD)  is  more  accurate  in  the  correction  of  string  errors  than  the 
abbreviation  method  (Blair,  1960),  the  ordered  key  letters  method 
(Tanakd  and  Kasai,  1972)  and  the  elastic  matching  method 


22 


(Levenshtein,  1966),  where  all  of  these  apply  substitution,  insertion  and 
deletion  to  string  symbols.  Fu  and  Lu  (1977)  have  classified  the  weight 
metrics  into  three  categories,  but  did  not  consider  the  symmetric  pro¬ 
perty  of  the  metric.  We  would  like  to  further  extend  this  idea  and 
include  the  discussion  of  symmetric  property. 


A.  General  String  Distances 


One  of  the  primitive  string  distance  definitions  is  called  the 
Levenshtein  distance  (Levenshtein,  1966).  The  Levenshtein  distance 
between  strings  x  and  y,  x,  y  e  £*,  denoted  as  dL(x,y),  is  defined  as 
the  smallest  number  of  transformations  required  to  derive  string  y 
from  string  x.  The  transformations  include  insertion,  deletion  and  sub¬ 
stitution  of  terminal  symbols. 

Definition  2. 1  For  any  two  strings  x ,  y  el',  we  can  define  a  sequence 
of  transformations  J =[Tlt  Tz,  ....  Tn\,  n  >  0,  e  (7s,  Tp,  T[\  for  lit 
i  n,  such  that  y  e  J(x).  The  transformations  Ts.  Tp  and  Tj  are 
defined  as  follows: 

(1)  substitution  transformation,  Ts 


CJjCl  v2 


W]6  «2  for  all  a,b 


(2)  deletion  transformation,  Tp 

T 

|  -  cjjCJg  for  all  ot  €  2. 

(3)  insertion  transformation,  Tj 

T 

«i«2  |— -wjawg  for  all  o  e  £. 


where  u j,  a>2  e  2*. 


Definition  2.2  The  Levenshtein  distance  dL(x,y)  is  defined  as 
dL(x,y )  =  min  kj  +  mj  +  n}- 

where  kj,  rrij  and  n,-  are  respectively  the  number  of  substitution,  dele¬ 
tion  and  insertion  transformations  in  J . 

Definition  2.3  A  distance  between  two  strings  x,  y  e  £*,  d(x,y)  is  sym¬ 
metric  if  and  only  if  d(x,y)  =  d(y  ,x). 

Since  all  the  insertion,  substitution  and  deletion  transformations 
are  counted  equally,  the  Levenshtein  distance  is  symmetric.  It  is 
equivalent  to  assigning  weight  1  to  each  of  the  transformation.  We  call 
these  weights  type  0  weights. 

The  computation  of  the  Levenshtein  distance  can  be  implemented 
by  dynamic  programing  technique  on  a  grid  matrix  as  shown  in  Figure 
2.1.  The  partial  distance  <5[i,j],  which  denotes  the  minimum  distance 
from  point  (0,  0)  to  point  (»,j),  can  be  computed  from  the  partial  dis¬ 
tances  1]  $[i  —  l,j  —  1]  and  <5[i  —  l.jr ]  as  shown  in  Figure  2.2.  The 

total  distance  is  simply  <5[n,m],  where  n  is  the  length  of  the  reference 
string  and  m  is  the  length  of  the  test  string. 

Since  the  minumum  distance  is  unlikely  to  occur  in  some  areas  of 
the  grid  matrix,  for  example,  the  upper  left  corner  and  lower  right 
corner,  a  globol  path  constraint  can  be  imposed  to  save  computation 
time.  Figure  2.3  shows  a  window  constraints  such  that  only  those  points 

(i,  j),  \i  —  -^-j\  s  r,  where  Osj'sm,  r  is  a  selected  constant, 
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are  subject  to  distance  computation.  Algorithm  2.1  is  for  general  string 
distance  computation  with  global  path  constraint. 
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Algorithm  2.1.  Computation  of  general  string  distance  with 
global  path  constraint 

Input:  Two  strings  x=a1a2  •  •  •  a*  and  y=6j62  •  •  •  bm  where 
ctj ,  bj  e  £  for  all  1  <.i<sn ,  1  <,j  <.m , 
and  a  constant  r  for  global  path  constraint. 

Output:  The  general  string  distance  d(x  ,y). 

Method: 

(1)  6[ 0.  0]  :=  0; 

(2)  for  i  :=  1  to  r  do  <5[i,  0]  :=  <5[i-l,  0]  +  A*; 

(3)  for  j  :=  1  to  — r  do  5[0,.?]  :=  d[0j— 1]  +  A,-; 
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(4)  for  j  :=  1  tom  do  begin 

. ,  n 

1 1  :=  — ?  -  r: 

771 

12  :=  — ~jf  +  r; 

fn 

for  i  :=  il  to  i2  do 

if  (i  a:  1)  and  (isn)  then  <5[i ,j]  :=  ); 

(*  )  is  a  function  for  local  distance  computation  *) 

end; 

(5)  d(x,y)  :=  d[n,m]; 


We  use  a  function  min(i,j )  in  Algorithm  2.1  to  compute  the  local 
distance.  The  function  mm (i ,j )  can  be  computed  seperately  to  match 
different  local  distance  constraints  and  return  a  distance  value.  For 
Levenshtein  distance,  min(ij')  =  min  [  5[i  — l.jf]  +  1.  d[i,jf-l]  +  1, 
<5[i  —  l,j  —  1]  +  1  {  if  a*  A  bj\  otherwise  min(i,j)  =  6[i  —  l,j  —  l].  This 
arrangement  is  more  flexible  since  the  dynamic  programming  portion 
never  need  change,  only  different  function  min(i,j)  is  used  for 


different  applications. 

The  Levenshtein  distance  appears  to  be  not  powerful  enough  for 
many  pattern  recognition  applications.  However,  it  may  be  sufficient 
for  string  matching  in  information  retrieval  (Hall  and  Dowling,  1900). 
Fu  and  Lu  (1977)  have  proposed  a  weighted  Levenshtein  distance  (WLD) 
where  different  weights  are  associated  with  different  transformation 
and  terminals. 

We  can  make  the  string  distance  definition  more  flexible  and  prac¬ 
tical  by  assigning  different  weights  to  different  transformations  and/or 
terminals.  There  are  at  least  three  possible  cases.  In  the  first  case, 
different  weights  are  assigned  to  different  transformations  but  all  ter¬ 
minals  are  treated  equally.  We  call  these  weights  type  1  weights.  Here 
are  the  transformations: 

T  o  o 

(1)  Wjacjg  I—5-1 — CJ]6  CJ2  for  a*b  E  £,  a  A  b  ,  where  a  is  the  cost  of 

substituting  6  for  a ,  a  =  0  when  6  =  a . 

TD,  7 

(2)  cjjacjg  I - «icj2  f°r  a  e  “•  where  7  is  the  cost  °f  deleting 


•p  p 

(3)  cjj&;2  | — '■ — cjjacj2  for  all  a  e  £,  where  p  is  the  cost  of  inserting 


a . 

where  Uj,  cj2  e  Z*. 

The  distance  defined  by  these  transformations  is  called  type  1 
weighted  Levenshtein  distance. 

Definition  3.4:  The  type  1  weighted  Levenshtein  distance  d'yi(*,T/)  is 


defined  as 
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where  fcy,  m3-  and  ny  are  defined  the  same  as  in  Definition  2.3. 

Theorem  2.1  dWl(x,y)  is  symmetric,  i.e.,  dw'1(z,y)  =  d^fy.z),  if  and 
only  if  y  =  p. 

The  WLD  d^z.y)  can  be  computed  by  Algorithm  2.1  where 
min(i,j)  =  min  —  l]  +  p,  6[i  —  1  ,jf  —  1  ]  +  a,  (5[i— 1  j ]  +  y\  as  shown  in 

Figure  2.4(a).  The  weights  in  step  (2)  and  (3)  should  also  be  changed. 

In  the  second  case,  different  weights  are  assigned  to  different 
transformations  and  terminals,  but  the  weights  associated  with  the  ter¬ 
minals  are  context-indepentent.  We  call  these  weights  type  2  weights. 
We  have  the  following  transformations: 


(1)  ejjaog  I 


7*,  5(o,6) 


CJjfc 


«2  for  all  a, 6  e  E,  b  ^  6,  where  S(a,b)  is 


the  cost  of  substituting  b  for  a,  S(a,a)  =  0. 

(2)  cjiau2  1  TD-D{a)  cjtcj2  for  all  a  e  £,  where  D(a)  is  the  cost  oi 


deleting  a. 


(3)  OjOg  | 


TItJ(  a) 


Wjawg  for  all  a  e  S,  where  /(a)  is  the  cost  of 


inserting  a. 
where  cjj,  u2  £  E*. 


The  distance  defined  by  these  transformations  is  called  type  2 
weighted  Levenshtein  distance. 

Definition  2.5.  The  type  2  weighted  Levenshtein  distance  dr2(z,y)  is 


defined  as 


dwz(x,y)  =  minp5y(o,6)  +  £Z?,(a)  +  £/,(&) 

where  a, 6  e  £  and  J  is  the  sequence  of  transformations  used  to  derive 
y  from  s . 

Theorem  2.2  dwz(x  ,-y)  is  symmetric  if  and  only  if  D(a)  =  1(a)  and 
5(a,6)  =  5(6  ,a)  for  all  a, 6  e  E. 

The  type  2  WLD  dwz(x,y )  can  also  be  computed  by  Algoritm  2.1 
where  min(i,j )  =  min  —  1]  +  1( bj),  <5[i  — 1,>  — 1]  +  5(0*. bj), 

6[i  — l,j]  +  iKa*)}  as  shown  in  Figure  2.4(b).. 

In  the  third  case,  the  weights  associated  with  the  terminals  for 
insertion  and  deletion  are  context-dependent.  We  call  these  weights 
type  3  weights.  We  have  the  following  transformations: 

Ts  S(a.b) 

(1)  cj,acJ2  | — : - 1 - <u16q2  for  all  o,6  eI,o^6,  where  5(a,6)  is 

the  cost  of  substituting  6  for  a,  5(a,a)  =  0. 

(2)  cjjabuz  |  ,a—  cj1bvz  for  all  a  e  E,  6  e  Eul*i»  where 

D(b  ,a )  is  the  cost  of  deleting  a  in  front  of  6 , 

(3)  cjjo«2  \~'1'  — a-’b -  Wi6a«2  for  all  6  e  E,  a  e  Eu{<Sr{,  where 
I(a,b)  is  the  cost  of  inserting  6  in  front  of  a. 

where  ult  «2  e  2 *. 

The  reason  of  using  (2)  is  for  symmetric  purpose.  As  we  mentioned 
earlier,  the  symmetric  property  is  important  in  distance  computation; 
otherwise,  the  distance  between  two  strings  "will  not  be  unique,  depend¬ 
ing  on  the  selection  of  reference  string  and  test  string.  In  string  recog¬ 
nition,  there  may  not  be  such  problem,  since  we  know  the  reference 
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and  test  string.  However,  in  string  clustering,  the  problem  will  occur, 
since  we  have  to  treat  each  string  equally.  Context-dependent  weights 
are  useful  in  some  other  applications,  for  example,  in  speech  recogni¬ 
tion,  where  the  repetition  of  some  symbols  is  considered  legal.  For 
instance,  the  strings  x,  y ,  where 

x=aaabbc 
y =  aab  b  c  c 

may  be  considered  identical,  i.e.,  with  zero  distance.  In  this  case,  it 
can  be  easily  implemented  by  letting  /(a, a)  =  D(a, a)  =  0  for  all  a  e  2. 

The  distance  defined  by  these  transformations  is  called  type  3 
weighted  Levenshtein  distance.  These  transformations  are  similar  to 
what  Fu  and  Lu  (1977)  have  proposed  but  different  in  two  aspects. 
First,  a  right  endmarker  "&c"  is  used  for  both  the  reference  and  test 
strings,  therefore  no  additional  transformations  are  needed  to  handle 
the  end  point  insertion  or  deletion.  From  now  on,  we  will  use  £'  to 
represent  £  u  {&}•  Second,  the  weights  associated  with  deletion 
transformation  are  context-depentent. 

Definition  2.6:  The  type  3  weighted  Levenshtein  distance  dW3(x,y)  is 
defined  as 

dW3(x,y)  =  minjss^a.ft)  +  (c;  ,a)  +  £/y(  c,a) 

where  a, 6  e  £,  c  e  £'  and  J  is  the  sequence  of  transformations  used  to 
derive  y  from  s . 

Theorem  2.3  dwz{x  ,y)  is  symmetric  if  and  only  if  D(a,b)  =  I  (a  ,6)  and 
S(a,b )  =  S(b  ,a)  for  all  6  e  £,  a  e  £'. 


Before  deriving  algorithm  for  computing  type  3  WLD,  we  have  to 
consider  one  more  problem.  Since  the  weights  are  context-dependent, 
the  order  of  insertion  and  deletion  transformations  can  no  longer  be 
ignored. 

Example  2.1:  Let  the  string  y-abcda  fi  and  z=aa/3,  x  ,y  e  E*,  a,  /3  e 
(£tj-W)V  then  the  transformations  from  x  to  y  can  be 

aa|8  abaff  |  ^ (a ,c^- abca/3  [  — abcda/g,  or 

aa/9  |7l^  -6)-a6a/3  i7ta'd)  afeda/g  |  abcda/3,  or 

aa/S  |  7Ca,cI  aca/g  |  7(c..,.^-)-otbc aff  | /(.Q,.>,c0  abcdafj,  or 

aap  |  7^a,c  ^  aca/g  |  7(atcO-otctjajg  |  7^c  ^  ^  abcda/5,  or 

aa/S  |  7.(P,i.cO..  a  dap  \  -7(d'6)- abdafi  |  7feg-)-  a6cda/3,  or 

aa/3  |  LLa,.’.d)  ,ada.p  |  LL'Lsl.acdap  |7(c»*)  abcdap 

There  are  six  different  transformations  available  for  Example  2.1. 
In  fact,  there  are  A:!  different  transformations  to  insert  A:  symbols  in 
front  of  any  specific  symbol  such  that  all  have  the  same  final  result.  In 
Example  2.1  there  is  no  reason  to  assume  that  the  order  of  insertion  is 
"b  follewed  by  c  followed  by  d".  Therefore,  the  minimum  cost  transfor¬ 
mation  should  be  determined  from  those  six  transformations.  However, 
the  computation  is  much  more  complicated  so  that  the  little  gain  from 
the  real  minimum  cost  transformation  may  not  pay  ofT  the  extra 
amount  of  computation.  If  we  are  allowed  to  chose  a  suboptimal  solu¬ 
tion,  we  will  stick  to  one  type  of  the  transformation,  i.e.,  the  first  one  in 
Example  2. 1 . 


The  cases  for  deletion  are  similar  to  those  for  insertion.  Consider 
Example  2.1,  the  transformation  from  y  to  x  corresponding  to  the  first 
one  is  as  follows: 

abcdafi  abc oft  abap  |  aap 

It  is  noted  that  the  symmetric  property  is  preserved  here. 

We  can  use  Algorithm  2.1  to  compute  the  type  3  WLD  d^x.y) 
where  min(i,j)  =  min  {  <5[i,7-l]  +  /(ai+i,6y),  <5[i-lj-l]  +  S^.h^, 
6[t  — l.j]  +  D(bj+l,ai )  l  as  shown  in  Figure  2.4(c).  The  weights  in  step 
(2)  and  (3)  should  also  be  modified. 

We  can  define  a  hierarchy  on  the  four  types  of  distances,  i.e.,  type  0 
distance  is  a  proper  subset  of  type  1  distance;  type  1  distance  is  a 
proper  subset  of  type  2  distance,  and  type  2  distance  is  a  proper  subset 
of  type  3  distance.  They  are  capable  of  computing  all  the  general  string 
distances  based  on  the  concepts  of  insertion,  deletion  and  substitution 
transformations.  However,  there  are  some  exceptions  of  distance 
measurements  which  do  not  base  on  the  idea  of  insertion,  deletion  and 
subtitution  transformations.  We  will  call  them  the  special  string  dis¬ 
tances. 

B.  Special  String  Distance 

The  special  string  distances  mean  that  these  distances  can  only  be 
applied  to  some  specific  applications,  also  they  are  not  based  on  the 
idea  of  insertion,  deletion  and  substitution  transformations.  One  exam¬ 
ple  is  the  dynamic  time  warping  for  speech  recognition,  the  other  is  the 
modified  dynamic  time  warping  for  damage  assesment. 

In  spoken  word  recognition,  the  recorded  speech  signal  from 
different  utterance  is  different  even  for  the  same  word  by  the  same 
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persqn.  Meanwhile,  the  time  difference  between  speech  patterns  is 
nonlinear,  therefore  a  nonlinear  matching  algorithm  is  requiered  in 
order  to  obtain  good  recognition  results.  A  special  technique  called 
time  warping  has  been  proposed  by  Sakoe  and  Chiba  (1978).  An  exam¬ 
ple  is  shown  in  Figure  2.5  where  x  =  a1az  ...  is  the  reference  pattern 
and  y  =  6j62  ...  6m  is  the  test  pattern.  Each  component  a*,  bj  of  string 
x ,  y  can  be  a  feature  vector  or  a  scalar  which  represents  a  signal  seg¬ 
ment.  (The  position  of  each  component  a bj  in  the  grid  matrix  is 
slightly  different  from  what  we  have  used  previously.) 

Definition 2.7  The  time  warping  distance  between  strings  x  and  y  is 

dTW(x,y)  =  £d(c(k)) 

fc= l 

where 

d (c  (* ))=d(i (*),;'(*:))  =  llatdfc)  -  6i(k)|| 
and  fc  is  the  index  of  common  time  axis. 

Two  major  differences  between  time  warping  and  the  general 
string-to-string  distance  can  be  pointed  out  immediately.  First,  one 
component,  i.e.,  symbol,  in  warping  function  can  be  used  more  than 
once.  For  example,  component  a4  in  Figure  2.6  has  been  used  to  com¬ 
pared  with  63  and  64.  Second,  the  components  may  be  skipped  without 
any  cost.  Although  the  general  string  distance  can  be  modified  by  let¬ 
ting  /(a,a)=0  and  Z?(a,b)  =  0  for  a, b  e  £,  to  simulate  time  warping, 
there  are  other  restrictions  on  the  time  warping  function,  for  example, 
slope  constraint.  Slope  constraint  will  eliminate  excessively  steep  or 
gentle  gradient  from  the  warping  function.  For  details  of  slope  con¬ 
straints  and  computation  of  time  warping  distance,  see  (Sakoe  and 
Chiba,  1978).  The  weights  used  for  time  warping  are  different  from 


those  for  insertion,  deletion  and  substitution,  and  can  be  tailored  to  fit 
specific  applications. 

A  path  constraint  similar  to  that  of  general  string  distance  (see  Fig 
2.3)  can  also  be  applied  here,  i.e., 

m 

where  r  is  the  path  width.  This  will  prevent  warping  function  from  hav¬ 
ing  unrealistic  matches.  Sakoe  and  Chiba  (1978)  proposed  a  path  con¬ 
straint 

|i(fc)-<7(fc)|  fir 

This  window  is  along  the  diagonal  axis  i(k )~j(k).  Since  the  dynamic 
programming  proceeds  from  point  (0,0)  to  point  (n,m),  the  window 

should  be  along  the  diagonal  axis  i(k)  =  2-j(k)  as  shown  in  Figure  2.3. 
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It  has  been  shown  by  Sakoe  and  Chiba  (1978)  that  the  symmetric  time 
warping  distance  has  higher  recognition  accuracy  than  asymmetric 
time  warping  distance. 

In  some  applications,  specifically  string  distance  computation  for 
damage  assesment,  one  component  in  one  string  is  equivalent  to  the 
summation  of  several  components  in  another  string.  For  example,  in 
Figure  2.6  the  top  two  segments  may  come  from  the  seismic  recordings 
of  a  buildings  without  damage  while  the  bottom  two  segments  may 
come  from  the  same  building  with  certain  degree  of  damage.  If  we  con¬ 
sider  each  component  in  Figure  2.6  as  an  appropriate  measurement 
then  63  =  03  +  a4  +  a5  +  a6  +  a7  and  dz  =  c2  +  c3  +  c4,  since  63  is  a  dis¬ 
tortion  of  a3,  a4,  a.5,  a6  and  a7,  and  dz  is  a  distortion  of  cz,  c3  and  c4. 


Therefore  we  can  modify  the  slope  constraints  and  local  distance  func¬ 
tions  in  Sakoe  and  Chiba  (1978)  and  use  them  for  distance  computation. 
The  modified  slope  constraints  are  shown  in  Figure  2.7.  Since  the  local 
distance  functions  min(i,j )  are  symmetric,  the  modified  time  warping 
distance  is  also  symmetric.  The  local  distance  functions  min(i,j)  are 
changable  as  we  will  see  in  chapter  III. 

C.  Normalized  Distance 

All  the  distance  measures  discussed  so  far  are  absolute  distances. 
For  example,  consider  two  pairs  of  strings  xlr  yy  and  x2  and  y2, 

x  j  =  aaabbbcccddd 
y j  =  a cabbbcccdbd 
x  2  "*  ad 
3/2  =  cb 

The  distance  between  xx  and  j/j  is  two  (substitution  errors).  The  dis¬ 
tance  between  x2  and  y2  is  also  two  (substitution  errors).  However, 
when  taking  the  whole  string  length  into  consideration,  string  pair  z, 
and  Vi  are  more  similar  than  string  pair  x2  and  y2.  This  shows  that 
equal  absolute  distance  does  not  necessarily  indicate  equal  similarity. 
Sakoe  and  Chiba  (1978)  have  proposed  a  normalized  distance  for 
dynamic  time  warping,  which  is  equal  to  division  of  the  absolute  dis¬ 
tance  by  the  total  length  of  the  strings.  When  absolute  distances  are 
equal,  the  normalized  distances  tend  to  favor  longer  strings.  This  same 
idea  can  be  applied  to  general  string  distance  computation  with  inser¬ 
tion,  deletion  and  substitution. 


*1-1  *i 


6[t,j  ]  =  min 


<5[i..?-l]+  lai-6i  I 
<5[i-l,j]+  I 


«5[i  ,j  ]  =  min 


<5[i-l,7-2]+  |  Oi  -bj  -6y_i  I 

<5[t-l.J-l]+la<_6y  I 

<5[i-2,j-l]+|ai_1+ai-by  I 


<5[i  - 1  ,j  -3]  +  I  Oi  -by  —by  _ i  by  _2  i 

d[-i-l,;-2]+  |oi-by-by_j  | 

<5[i-lj-l]+  |ai-by  | 

6[i -2,j -1]+ |  Oi.i+a^ -by  | 

<5[i-3,7  -1]+  1 a*  .3+0*  .1+04  -by  | 


Figure  2.7  Examples  of  slope  constraints  and  corresponding  local  dis¬ 
tance  function  of  modified  time  warping  distance. 


c 
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2.2.2  Similarity  Measures  Based 
on  Likelihood  Concept 

The  string  distance  measures  discussed  in  the  previous  section  are 
for  nonstochastic  models.  In  stochastic  language,  every  string  is  asso¬ 
ciated  with  a  probability  (Fu  and  Huang,  1972).  Therefore,  we  use  pro¬ 
bability,  instead  of  weight,  to  characterize  the  transformation.  Some  of 
the  stochastic  context-dependent  transformations  have  been  proposed, 
for  example,  substitution  has  been  proposed  by  Fung  and  Fu  (1975), 
substitution  and  insertion  have  been  proposed  by  Lu  and  Fu  (1977b). 
Here  we  add  context-dependent  deletion  transformation.  We  still  use 
7s,  Tj  and  Tp  to  represent  substitution,  insertion  and  deletion 
transformation  respectively.  Associated  with  7’s,  Tj  and  TD  we  use  P$. 
Pi  and  PD  for  transformation  probabilities.  Transformations  with 
context-dependent  probabilities  are  defined  as  follows: 

To.  Po(b  |  a) 

(1)  I - Qjb  w2  for  all  a, 6  e  2,  where  P5(6  I  a)  is  the 

probability  of  substituting  6  for  a. 

(2)  cjjabug  |  — IfLL  cjjb  uz  for  all  a  €  2,  6  e  2  ,  where 

Pp(b  |ab)  is  the  probability  of  deleting  a  in  front  of  6 . 

(3)  Wjaog  |  TLlEiJUL  !-a)-  cj1bau2  for  all  6  e  2,  a  e  2’,  where 

P/(6a  | a)  is  the  probability  of  inserting  6  in  front  of  a. 
where  w1(  u2  e  2*.  and 

2  Ps(6  |a)  +  2  Pp(b  |a6)+  2  Pj(ba  |a)  =  1 

fa  6  eE'  b  e£ 


for  all  a  e  2. 


The  probability  associated  with  the  transformation  of  one  string 
from  another  is  called  stochastic  similarity.  A  higher  transformation 
probability  between  two  strings  means  they  are  more  similar.  Similar 
to  the  various  weights  for  nonstochastic  cases  in  Section  2.2.1,  we  can 
also  define  many  different  types  of  transformation  probabilities,  for 
example,  context  independent,  terminal  independent  or  transformation 
independent.  Since  they  are  the  simplified  versions  of  the  one  just 
defined,  we  will  only  use  the  above  one  as  an  example  in  the  following. 
Definition  2. 8  The  stochastic  similarity  between  strings  x  and  y  , 
ds(x,y),  is  defined  as 

ds(x,y)  =p(y  |x) 

=  max  q5{y  |s) 

i 

where 

9j{y  I*)  is  the  probability  of  transfomations  J  which  derives  y  from 
z . 

The  transformation  probability  p  (y  |  x )  is  the  maximum  probability 
among  those  associated  with  all  the  possible  transformations  from  z  to 

y 

Theorem  2.4  ds(x,y)  is  symmetric  if  and  only  if  F/>(a|6a)  =  P/(6a|a) 
and  Ps( b  |c)  =  Ps(c  |6)  for  all  b  ,c  e  £,  a  e  S'. 

The  computation  of  stochastic  similarity  can  also  be  carried  out  by 
dynamic  programming  technique  A  local  probability  function  replaces 
the  local  distance  function  of  nonstochastic  cases.  However,  the  proba¬ 
bility  function  selectes  the  maximum  of  the  probabilities  which  come 
from  below,  left  and  lower  left,  see  Figure  2.8  for  a  graphic  illustration. 
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Algorithm  2.3  Computation  of  stochastic  string  similarity 

Input:  Two  strings  z  s  Oja2  ...  and  y  =  bjbg  ...  bmbm+1 

where  a by  e  2  for  all  lsi^n  ,  =  Sc, 

bm+i  =  Sc,  and  the  probabilities  associated  with  transformations 
on  terminals  in  2  and 
Output:  stochastic  similarity  ds(x,y). 

Method: 

(1)  <5[0,  0]  :=  l; 

(2)  for  i  :=  1  to  n  do  <5[“t ,0]  :=  <$[t-l,0]  •  P#(bi 

(3)  for  j  :=  1  to  m  do  <5[0,j]  ’=  6[0,j-i]  ■  P/(bytXi  |  a-!); 

(4)  for  i  :=  1  to  n  do 

for  ;  :=  1  to  m  do  begin 

]  :=  max  •  P/(byO*+i  I  “i+i), 

•  ^s(6y  K),  <5[i  —  1  ,j  ]  •  Fz>(6;+i|ai6y+1)|; 

end; 

(5)  ds(x ,y)  :=  <5[n,m]; 

We  can  also  use  a  global  path  constraint  here  to  speed  up  the  com¬ 
putation. 

Similarity  measure  is  one  of  the  fundamental  constituent  of  pat¬ 
tern  recognition.  In  some  applications,  for  example,  string-matching, 
the  recognition  accuracy  relies  almost  entirely  on  the  accuracy  of  simi¬ 
larity  measure.  Even  the  error-correcting  parsing  is  closely  related  to 
similarity  measures.  We  will  discuss  the  relation  between  EC  (error- 
correcting)  parsing  and  similarity  measure  in  the  next  section.  The  dis¬ 
tance  measures  defined  in  this  chapter  are  not  metric.  They  have  the 
properties  of  positivity  and  symmetry,  but  do  not  necessarily  have  the 
property  of  triangle  inequality.  The  accuracy  of  actual  similarity 
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measure  depends  on  many  parameters.  The  most  significant  one  is  the 
assignment  of  weights  and  probabilities.  The  weights  and  probabilities 
assignment  is  case-dependent  and  usually  heuristic.  Previous 
knowledges  and  statistics  may  guide  the  assignment  in  some  cases. 
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2.3  Error-Correcting  Parsing 

Error-correcting  parser  (ECP)  has  been  proposed  in  the  areas  of 
compiler  design  (Aho  and  Peterson.  1972)  and  syntactic  pattern  recog¬ 
nition  (Fu,  1977).  When  a  conventional  parser  fails  to  parse  a  string,  it 
will  terminate  and  reject  the  string.  An  error-correcting  parser  pro¬ 
duces  same  results  as  a  conventional  one  when  the  string  is  syntacti¬ 
cally  correct.  However,  it  also  generates  a  parse  for  the  string  even 
when  it  has  minor  syntax  errors.  The  significance  of  error-correcting 
parsing  in  compiler  design  is  still  controversial  since  it  may  misinter- 
prete  the  programmer’s  intention.  However,  its  significance  in  syntac¬ 
tic  pattern  recognition  is  unquestionable.  The  most  important  reason 
is  the  noise  problem.  The  noise  may  come  from  sensor  device,  environ¬ 
ment  or  data  communication.  These  will  cause  segmentation  error  and 
primitive  recognition  error,  and  therefore  result  in  syntax  error  In 
many  cases,  the  pattern  grammars  are  constructed  from  a  finite  set  of 
training  samples,  and  then  used  to  recognize  a  larger  set  of  test  sam¬ 
ples.  Therefore,  it  is  not  surprising  that  the  conventional  parsers  usu¬ 
ally  fail  to  work. 


The  error-correcting  parsing  algorithms  can  be  classified  into  two 
categories,  one  uses  minimum-distance  criterion  the  other  uses 
maximum-likelihood  criterion.  The  minimum-distance  error-correcting 


parser  (MDECP)  is  for  nonstochastic  models  where  string  similarity  is 
measured  by  distance.  The  maximum-likelihood  error-correcting 
parser  (MLECP)  is  for  stochastic  model  where  string  similarity  is  meas¬ 
ured  by  probability. 

The  ECP  in  this  chapter  is  different  from  other  existing  ECP’s  in 
two  aspects;  first,  it  uses  symmetric  similarity  measures,  second,  it 
does  not  use  expanded  grammar. 

2.3.1  Minimum-Distance  Error-Correcting  Parsing  Algorithm 

For  the  purpose  of  generality  we  will  discuss  context-free  grammar 
(CFG)  throughout  this  chapter.  Since  finite-state  laguage  (FSL)  is  a 
subset  of  context-free  language,  all  the  principles  described  here  can 
be  applied  to  FSL  as  well.  Of  course,  the  implementation  can  be 
modified  to  increase  the  efficiency.  Given  a  CFG  G  and  an  input  string 
y  e  £*,  a  minimum-distance  error-correcting  parser  (MDECP)  gen¬ 
erates  a  parse  for  some  string  x  e  L(G)  such  that  the  distance  between 
x  and  y,  d(x,y )  is  as  small  as  possible.  Since  we  have  defined  several 
different  string  distance,  therefore  different  error-correcting  parsers 
can  be  constructed. 

Aho  and  Peterson  (1972)  have  shown  a  minimum-distance  error- 
correcting  parsing  algorithm  which  uses  the  Levenshtein  distance.  We 
will  call  their  algorithm  "Algorithm  A"  for  short.  They  first  transformed 
the  original  grammar  into  an  expanded  grammar  which  includes  all  the 
possible  error  productions.  Then,  they  modified  the  Earley's  parsing 
algorithm  so  that  the  number  of  error  productions  used  is  stored  in  the 
item  list.  The  productions  of  the  expanded  grammar,  P',  is  constructed 
from  P  as  follows: 


(1)  For  each  production  in  P,  replace  all  terminals  a  e  £  by 
by  a  new  nonterminal  Ea  and  add  these  productions  to  P . 

(2)  Add  to  P‘  the  productions 

a)  S'  -*  S 

b)  S'  -  SH 

c)  H  -*  HI 

d)  //  -  / 

(3)  For  each  o  e  S,  add  to  P'  the  productions 

a)  Ea  -*  a 

b)  Ea  -*  6  for  all  b  in  E,  6  A  a 

c)  Ea  -»  Ha 

d)  I  -*  a 

e)  Ea  -»  X,  A  is  the  empty  string 

In  step  (3),  the  productions  Ea  -*  b ,  I  -*  a  and  Ea  -»  X  are  called 
terminal  error  productions.  The  production  Ea  •*  b  introduces  a  sub¬ 
stitution  error.  I  -*  a  intorduces  an  insertion  error.  Ea  -*  A  introduces 
a  deletion  error.  For  the  Levenshtein  distance,  a  constant  weight,  e.g., 
1,  is  associated  with  each  of  these  productions.  It  will  also  handle  the 
type  1  WLD  dWl(x,y)  and  type  2  WLD  dwz(x  ,y )  in  a  similar  way.  For  the 
type  1  WLD,  weight  a  is  associated  with  production  Ea  -*  b,  weight  y  with 
Ea  -*  X  and  weight  p  with  I  -*  a.  For  the  type  2  WLD,  weight  S (a, 6)  is 
associated  with  production  Ea  -*  b ,  weight  D( a)  with  Ea  -*  X  and  weight 
1(a)  with  I  -*  a.  However,  the  problem  will  occur  when  it  comes  to  type 
3  WLD  c£ff,3(x,y).  In  order  to  maintain  the  symmetric  property  we  must 
have  D(a,b)  -  I(a,b)  for  all  6  e  £,  a  e  E  as  mentioned  in  Theorem  2.3. 
The  expanded  grammar  will  have  difficulty  in  handling  context- 
dependent  transformation  weight. 


Although  we  can  modify  this  expanded  grammar  to  handle  insertion 
weights,  as  did  in  Fu  (1982),  it  still  can  not  handle  the  deletion  weights. 
Since  the  productions  associated  with  context-dependent  deletion 
weights  will  be  something  like  bEa  -»  Ea,  D(a,b),  but  this  is  not  a 
context-free  production  rule,  even  not  a  context-sensitive  production 
rule.  While  the  expanded  grammars  seem  unable  to  solve  the  sym¬ 
metric  problem,  we  can  implement  the  ECP  without  the  expanded 
grammar.  This  idea  of  ECP  without  expanded  grammar  has  appeared  in 
Lyon  (1974)  where  type  0  distance  is  used.  His  main  concern  is  for 
practical  reasons:  to  save  space  and  execution  time.  Our  proposed  ECP 
using  type  3  WLD  is  a  modified  Earley’s  parsing  algorithm  where  the 
substitution,  insertion  and  deletion  transformations  are  examined  dur¬ 
ing  the  parsing. 

Algorithm.  2.4.  Minimum-Distance  Error-Correcting  Parsing  Algorithm 

Input:  A  grammar  G  =  (N£,P,S),  an  input  string 

y  =  b  j b2...bm  in  S',  and  the  weights  of  transformations 
between  symbols. 

Output .  The  parse  lists  70,  and  d(x,y )  where 

z  is  the  minimum-distance  correction  of  y,  x  e  L(G). 

Method: 

(1)  Set  j  =  0.  Add  [5 -*  •  a, 0,0]  to  Ij  if  5  -*a  is  a  production  in  P. 

(2)  Repeat  step  (3)  and  (4)  until  no  new  items  can  be  added  to  Ij. 

(3)  If  [A-*a  ■  is  in  /j-,and  B  -*y  is  a  production  in  P .  then  add 

item  [B -*  •  7.J.0]  to  Ij. 

(4)  If  [A-*a  ■  ,i,(]  is  in  Ij  and  ■  Ay.k.i;]  is  in  /*,  and  if  no  item 

of  the  form  [B-+&A  y,fc,<p]  can  be  found  in  /.-,  then  add  an  item 
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[B-*&A  y,k,$+Z]  to  Ij.  Store  with  this  item  two  pointers.  The  first 
points  to  item  ■  Ay,k  ,£]  in  /*;  the  second  points  to  item 

[4-»a  •  ,i,£]  in  Ij.  If  [B-*fiA  ■  y,k  ,<p]  is  already  in  Ij,  then  replace  <p  by 
£+£  together  with  the  pointers  if 

(5)  For  each  [ B-*a  •  a/3 ,i,f]  in  Ij,  add  [5-*oa  •  /3,if£+/?(6j,a)]  to  Ij. 
Store  with  this  item  a  pointer  to  item  [B -* a  •  a/3 ,i,£]  in  Ij.  If  no  more 
new  item  of  this  form  can  be  found,  go  to  step  (6);  otherwise,  go  to  step 
(2). 

(6)  If  j  ~m  ,  go  to  step  (9);  otherwise  j=j  +  \. 

(7)  For  each  item  [ B-*a  •  a/S ,i,f]  in  j  add  [B-*aa  ■  p,i,$+S(a  ,bj)] 
to  Ij.  Store  with  this  item  a  pointer  to  item  [B  -*a  •  a/3 ,i,£]  in  /y _ j . 

(8)  For  each  item  [B-*a  •  a/3 ,i,£]  in  /,•_ j  add  •  a /?,i,f+/(a,6J-)] 

to  Ij.  Store  with  this  item  a  pointer  to  item  [B  -*a  ■  a/3,i,£]  in  Ij~v  Go  to 

(2). 

(9)  If  item  [5-*a  •  ,0,£]  is  in  An.  then  d(x,y)  =  f.  If  there  are  more 
than  one  such  items,  then  choose  one  with  the  smallest  £.  Exit. 

In  this  algorithm,  step  (5)  examines  deletion  transformations,  step 
(7)  examines  substitution  transformations  and  step  (8)  examines  inser¬ 
tion  transformations. 

The  right  parse  of  the  input  string  can  be  constructed  from  the 
parse  lists.  Since  we  use  error-correcting  parsing,  it  is  possible  that 
there  may  exist  several  parses  associated  with  one  input  string,  but  we 
only  choose  the  one  with  minimum  distance. 

Algorithm.  2.5.  Construction  of  a  right  parse  from  the  parse  lists 

Input ;  I0,  h .  An.  the  parse  lists  for  string  y  -  5 ^bz  ■■■  bm 

Output .  A  parse  rr  for  x,  x  e  L(G),  and  the  distance 
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dw\x,y)  is  minimum  among  all  the  strings  in  L(G). 

Method: 

(1)  In  Im  choose  an  item  of  the  form  [S-*a  •  ,0,£]  where  £  is  as  small 
as  possible. 

(2)  Let  7T  be  the  empty  string  initially,  and  then  execute  the  routine 

•  ,0,£],m)  where  R([A-+a  •  j8,i,7j],j)  is  defined  as  follows: 

а)  If  /?=X,  then  let  i t  be  the  previous  value  of  ix  followed  by  the 
production  number  of  A-*a.  Otherwise,  n  is  unchanged. 

б)  If  [ A-*a  ■  f3,i,r)]  has  only  one  pointer,  then  execute  the  item 

where  it  points  to.  It  may  be  7?([A-*a  •  /3,i.t].i“l). 
R{[A -*a  •  or  A?([A-*a'  •  where  a=a’ a.  Return. 

c)  If  [A-*a  •  /S.i.rj]  has  two  pointers  and  a-a'B,  then  execute 

R (\_B -*y  ■  followed  by  R([A-*a  ■  B&,i,y],h).  Return. 

d)  If  a=X,  return. 

The  parse  constructed  by  Algorithm  2.5  is  for  *,  x  e  L{G),  i.e.,  no 
error  productions  are  included.  Usually  there  is  no  need  to  know  the 
error  productions  (or  equivalently  error  transformations);  but  if  we  do 
need  to  know,  we  can  store  the  information  like  D(bj,a),  5(o,6y)  or 
J(a,bj)  in  each  item.  Then  we  can  extract  the  exact  transformations 
when  we  execute  R  routines.  If  we  are  only  interested  in  the  minimum 
distance,  for  example,  to  determine  the  class  membership,  then  Algo¬ 
rithm  2.4  will  be  sufficient. 

Algorithm  2.4  is  more  powerful  (because  its  parse  is  in  terms  of 
symmetric  distance)  and  is  at  least  as  efficient  as  Algorithm  A. 


Lemma 2.1:  The  time  complexity  of  Algorithm  2.4  is  0(nz)  where  n  is 
the  length  of  the  input  string. 

The  proof  of  lemma  2.1  is  similar  to  that  of  (Aho  and  Peterson, 
1972).  Since  each  item  list  Ij  takes  time  0(jz)  to  complete,  therefore 
the  total  time  is  0(n3).  We  can  also  show  that  the  number  of  produc¬ 
tions  and  the  number  of  items  in  item  lists  of  Algorithm  2.4  are  less 
than  those  of  Algorithm  A.  Therefore,  less  numbers  of  productions  and 
items  have  to  be  considered  when  we  add  new  items  to  item  lists.  For 
each  item  [B-*a  •  a/?,i,f]  in  7y_j  in  Algorithm  2.4  there  is  an  item 
[B-*a  ■  Eafi,i,£\  in  1^\  in  Algorithm  A.  Let  us  consider  the  following 
tr  ansf  or  mations : 

(1)  Substitition .  There  is  an  item  [Ea-*  •  6  ,>-1,5 (a, 6 )]  in  7 ,-_i 

where  b-b i  and  [Ea-+b  ■  ,j  —  l,S(a,6)],  [B-*aEa  ■  ,6)]  in  Ij  in 

Algorithm  A.  There  is  only  one  item  [B-*aa  •  /5,i,f+5(a ,6y)]  in  Ij  in 
Algorithm  2.4. 

(2)  Deletion.  There  is  an  item  •  X,  j- 1,  D(a )]  and  [B-*aEa  ■  0, 

i.  $+D(a)]  in  /y_  j  in  Algorithm  A.  There  is  only  one  item 
[B-*a a  ■  fi,i,$+D(bj ,a)]  in  /,■ in  algorithm  2.4. 

(3)  Insertion.  There  are  items  [Ea-*Ha,j  —1,0],  [//->  •  /,j'-l,0]  and 
[/-+-6,  j—  l,  7(6)]  where  b=bj  in  7y_j  and  items  [7-»6  *  ,j  — 1,7(6)], 
[H-*I  ■  ,  j  —  1,  7(6)]  and  [Ea-*H  •  a,j -1,7(6  )]  in  7y  in  Algorithm  A.  There 
is  only  one  item  [B-»a  ■  a/S,i,£+7(a,6y)]  in  7;-  in  Algorithm  2.4. 

Since  all  the  other  items  not  involving  error  transformations  are 
unchanged,  therefore  we  can  see  that  the  time  complexity  of  Algorithm 
2.4  is  no  more  than  that  of  Algorithm  A,  i.e.,  the  time  complexity  of 
Algorithm  2.4  is  0(n3). 


We  have  shown  a  minimum-distance  error-correcting  parsing  algo¬ 
rithm  for  any  nonstochastic  CFG.  The  distance  is  symmetric  and  can 
be  any  one  described  in  Section  2.2.  For  a  stochastic  CFG,  we  can  also 
construct  a  maximum-likelihood  error-correcting  parser  which  will  be 
discussed  in  the  next  section. 

2.3.2  Maximum-Likelihood  Error-Correcting  Parsing  Algorithm 

Given  a  stochastic  context-free  grammar  (SCFG)  Gs  and  an  input 
string  y  e  £*,  a  maximum-likelihood  error-correcting  parser  (MLECP) 
generates  a  parse  for  some  string  x  e  L(GS)  such  that  the  probability 
p(y  \x)p(x)  is  the  maximum,  where  p(y  |x)  is  the  deformation  proba¬ 
bility  from  string  x  to  y  and  p(x)  is  the  probability  associated  with 
string  x  in  L(GS)  (Fu,  1982).  There  may  exist  more  than  one  derivation 
trees  for  each  z  e  L(GS)  unless  the  grammar  G,  is  unambiguous. 
Meanwhile,  there  will  be  many  possible  transformations  from  string  z  to 
y.  We  define  p(y  |x)p(x)  as  the  one  with  maximum  probability,  i.e., 

p(y  ]x)p(x)  =  max  g  (y  jzjp^z) 

t  j 

where  Pi(x )  is  the  probability  associated  with  the  ith  distinct  deriva¬ 
tion  of  string  z  and  qj(y  |z)  is  the  probability  associated  with  the  jth 
distinct  transformation  from  z  to  y.  The  probability  p(y  |x)  which  is 

equal  to  max  qAy  |x)  is  exactly  the  same  as  what  we  defined  for  string 
j 

similarity  in  Section  2.4. 

The  proposed  MLECP  is  a  modified  Earley’s  parsing  algorithm.  It 
does  not  require  an  expanded  grammar  and  is  applicable  to  ambiguous 
grammars.  The  transformation  probabilities  as  well  as  the  insertion, 
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deletion  and  substitution  transformations  are  examined  during  the 
parsing.  The  partial  probabilies  are  stored  in  each  item  list.  Pointers 
to  the  previous  items  are  also  stored  in  the  item  lists  to  save  parse 
extraction  time. 

Algorithm.  2.6.  Maximum- Likelihood  Error-Correcting  Parsing  Algorithm 
Input:  A  stochastic  grammar  Gs  =  (N£,PS  ,S),  an  input  string 

y  =  6  in  2*.  and  the  probabilities  of  transformations. 

Output:  The  parse  lists  70,  7 . . Im  ,  and  p(y  |x)p(x)  where 

x  is  the  maximum-likelihood  correction  of  y ,  x  e  L(GS). 
Method: 


(1)  Set  j  —  0.  Add  [S'-*  •  0,0,7)]  to  Ij  if  S  *  a  is  a  production  in  P . 

(2)  Repeat  step  (3)  and  (4)  until  no  new  items  can  be  added  to  Ij. 

q 

(3)  If  [A-*a  •  Bf3,i,f]  is  in  Ij, and  B  -»  7  is  a  production  in  P,  then 
add  item  [B -»  •  7 ,j  ,q  ]  to  Ij . 

(4)  If  [A-*o  -  ,i,f]  is  in  Ij  and  [2?-*/S  •  A7,fc,f]  is  in  Iit  and  if  no  item 

of  the  form  [B-*&A  ■ y,k,<p ]  can  be  found  in  Ij,  then  add  an  item 

[B-+&A  -7,* ,£•{■]  to  Ij.  Store  with  this  item  two  pointers.  The  first 

points  to  item  [B-*f3  ■  Ay,k ,{]  in  /<;  the  second  points  to  item 

[A-*a  •  ,i,f]  in  Ij.  If  [5-*/SA  •  7 is  already  in  Ij,  then  replace  <p  by 
£•£  together  with  the  pointers  if  ?<(■(■ 

(5)  For  each  [B-*a  ■  in  Ij,  add  [2?->aa  •  fJ,i,$’PD(a  \  6^-a)]  to 

Ij.  Store  with  this  item  a  pointer  to  item  [ B-*a  •  a  ft  ,i,£]  in  Ij.  If  no 
more  new  item  of  this  form  can  be  found,  go  to  step  (6);  otherwise,  go 
to  step  (2). 


add 


(6)  If  j  =m ,  go  to  step  (9);  otherwise  j -j  +  1. 

(7)  For  each  item  [B-*<x  ■  in  Ij~\ 

[ B-+aa  ■  f},i,$-Ps(bj  |a)]  to  Ij.  Store  with  this  item  a  pointer  to  item 
[B^a  ■  ap ,i,(]  in  /,_!• 

(8)  For  each  item  [ B-*a  ■  a/3 ,i,£]  in  add 

[B-*a  ■  af},i,£  Pi(bja  |a)]  to  Ij.  Store  with  this  item  a  pointer  to  item 
[ B-*a  •  a/3,i,£]  in  Ij-\.  Go  to  (2). 

(9)  If  item  [5-»a  •  ,0,£]  is  in  Im,  thenp(y  |x)jd(*)  =  £.  If  there  are 
more  than  one  such  items,  then  choose  one  with  the  largest  £.  Exit. 

The  right  parse  can  be  extracted  from  the  parse  lists.  Algorithm 
2.5  can  be  applied  here  except  that  in  step  (1)  we  choose  an  item  of  the 
form  [5-»a  ■  ,0,f]  in  Im  which  is  as  large  as  possible.  The  parse 
extracted  here  contains  no  error  productions.  We  can  also  store  and 
extract  the  error  transformations  as  did  in  the  last  section.  The  time 
complexity  of  Algorithm  2.6  is  also  0(n3)  since  the  procedures  are 
almost  identical  to  those  of  Algorithm  2.4. 

Lemma 2. 2:  The  time  complexity  of  Algorithm  2.6  is  0(n 3)  where  n  is 
the  length  of  the  input  string. 

Suppose  Gg  is  an  expanded  grammar,  then  the  stochastic  language 
generated  by  Gs'  is 

L{GS)  =  (y,p(y))  I  y  e  Z',p(y)  =  2  E 9i(v  IOpOO 

zeI(C,)i*  l 

where  r  is  the  number  of  distinct  transformations  from  string  x  to  y, 
7i(y|x)  is  the  probability  associated  with  the  ith  transformation  and 
p{x )  is  the  probability  associated  with  x.  Although  string  y  is 


55 


generated  by  the  expanded  grmmar  Gg,  the  probability  associated  with 
V  •  P(y)>  can  be  computed  without  the  expanded  grammar. 

Algorithm  2. 7.  Computation  of  String  Probability 

Input:  A  stochastic  grammar  Gs  =  (NX,PS  .<?).  an  input  string 

y  =  b1b2...bm  in  2*,  and  the  probabilities  of  transformations. 
Output:  The  probability  associated  with  y ,  p  (y),  where  y  is  generated  by 
the  expanded  grammar  Gs\ 

Method: 

P 

(1)  Set  j  —  0.  Add  [S  -»  •  a,0,p  ]  to  Ij  if  S  -»  a  is  a  production  in  P. 

(2)  Repeat  step  (3)  and  (4)  until  no  new  items  can  be  added  to  Ij. 

q 

(3)  If  [A-»a  ■  Bfl.i.f]  is  in  7Jtand  B  -*  y  is  a  production  in  P,  then 
add  item  [B  -*  ■  y,j  ,q  ]  to  Ij . 

(4)  If  [A-*a  •  ,t,f]  is  in  Ij  and  [B-*fi  ■  Ay,k,{]  is  in  Ii,  and  if  no  item 
of  the  form  [B-*&A  -y,k,<p]  can  be  found  in  Ij,  then  add  an  item 
[B-*/3A  ■  y,k,$-(]  to  Ij.  If  [B-*0A  ■  y,k,<p]  is  already  in  Ij,  then  replace  q> 
by  <p+S  Z. 

(5)  For  each  [B-*a  ■  a/9,i,£]  in  Ij,  add  [B-*aa  ■  f3,i,£-PD(a  |6ya)]  to 
Ij.  If  no  more  new  item  of  this  form  can  be  found,  go  to  step  (6);  other¬ 
wise,  go  to  step  (2). 

(6)  If  j  =m ,  go  to  step  (9);  otherwise  j =7  +  1. 

(7)  For  each  item  [B  -*a  ■  in  Ij^  add 

[ B-*aa  ■  /3,i,$  Ps(bj  |a)]  to  Ij. 
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(8)  For  each  item  [B-*a  •  a|S,i,£]  in  /j-_i  add 
[B-*a  ■  afi,i,t-Pf(bja  ]a)]  to  lj.  Go  to  (2). 

(9)  For  all  items  of  the  form  [S-»ai  •  ,0,£i]  is  in  Im,  sum  up  all  the 

P(y)  =  ?  Exit. 

1 

Algorithm  2.7  is  useful  in  computing  the  class  conditional  probabil¬ 
ity  which  is  used  in  Bayes’  decision  rule.  Given  a  string  y,  Algorithm  2.7 
is  able  to  compute  the  probability  that  y  is  generated  by  Gs\  i.e., 
p{y  |GS)-  Even  for  a  string  x  e  L(GS),  we  can  compute  the  probability 
p  (x  |  Gs)  where  Gs  is  the  original  stochastic  grammar.  This  capability  is 
important  since  from  generation  point  of  view  it  is  very  difficult  to  find 
the  summation  of  the  probabilities  of  all  the  possible  derivations.  But  if 
we  go  the  other  way,  i.e.,  by  parsing,  it  is  very  easy  to  get  the  probabil¬ 
ity.  If  we  are  dealing  with  Gs  only,  i.e.,  to  find  the  probability  p  (z  |GS), 
x  e  L(GS),  then  we  should  skip  step  (5),  (7)  and  (0)  of  Algorithm  2.7. 
Since  these  steps  are  for  deletion,  substitution  and  insertion  deforma¬ 
tions. 


2.4  Recognition  Procedures  for  Syntactic  Patterns 

In  syntactic  pattern  recognition,  if  classification  is  the  only  pur¬ 
pose,  then  we  can  use  either  ECP  or  NNR.  Given  a  CFG  G  and  an  input 
string  y  €  £*,  a  MDECP  generates  a  parse  for  some  string  *  e  L(G) 
such  that  the  distance  between  x  and  y  is  as  small  as  possible.  This 
distance  is  defined  as  the  distance  between  string  y  and  set  L(G).  On 
the  other  hand,  a  MLECP  generates  a  parse  for  some  string  x  e  L(GS ) 
such  that  the  probability  p(y  |x)p  (x)  is  the  maximum.  This  probability 
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is  defined  as  the  likelihood  that  string  y  belongs  to  set  L(GS).  The 
nearest-neighbor  rule  for  the  case  that  G  is  nonstochastic  computes 
the  distances  between  y  and  all  the  string  x,  x  e  L(G),  and  select  the 
one  corresponding  to  the  smallest  distance.  The  nearest-neighbor  rule 
for  stochastic  case  computes  the  probabilities  p(y  |x)  for  all  x  e  Z(GS) 
and  select  the  one  such  that  the  product  p  (y  |  x  )p  (x )  is  the  maximum. 
The  probability  density  function  for  x ,  p  (x),  is  assumed  known. 

If  L(G)  is  finite  then  either  MDECP  or  NNR  can  be  used  to  find  x 
and  d(x ,y).  Similarly,  if  L(GS)  is  finite  then  we  can  use  either  MLECP  or 
NNR  to  find  x  and  p(y  |x)jd(x).  The  results  of  ECP  and  NNR  may  be 
different  depending  on  how  the  grammar  G  and  Gs  are  constructed.  If 
L(G)  or  L(GS )  is  not  finite,  then  the  NNR  will  not  be  able  to  test  the 
whole  L(G)  or  L(GS).  Thus,  it  is  necessary  to  find  a  finite  subset  of  L(G) 
or  L(GS )  so  that  the  NNR  can  be  implemented.  This  is  also  true  even 
when  L(G )  or  L(GS )  is  finite  but  with  a  size  hard  to  manage.  However, 
neither  MDECP  nor  MLECP  has  difficulty  in  dealing  with  infinite 
language.  Therefore,  it  is  advantageous  to  use  MDECP  or  MLECP  when 
L(G)  or  L{GS )  is  infinite,  since  the  recognition  accurancy  of  NNR  may 
be  degraded  because  of  the  limited  size  of  prototypes.  But  in  real 
application,  we  usually  encounter  a  finite  set  of  samples  and  need  to 
construct  or  infer  a  grammar  from  these  samples.  In  this  case,  the 
recognition  results  of  these  two  approaches  will  be  equal  if  the  con¬ 
structed  or  inferred  grammar  generates  exactly  the  original  samples. 
Therefore,  the  only  factor  affecting  the  selection  of  algorithm  is  compu¬ 
tation  speed.  The  NNR  uses  dynamic  programming  technique  whose 
time  complexity  is  0(nz)  where  n  is  the  length  of  input  string.  The 
complexity  of  MDECP  and  MLECP  is  also  0{nz )  if  the  grammar  is 


unambiguous.  Although  both  ECP  and  NNR  have  ©(n2)  time  complexity, 
NNR  is  usually  faster  than  ECP.  We  will  see  an  example  in  chapter  III. 

2.5  Conclusion 

We  have  discussed  four  types  of  string  similarity  measures  in  this 
chapter,  and  the  conditions  for  them  to  be  symmetric.  We  also  pro¬ 
posed  parsing  algorithms  to  deal  with  the  symmetric  problem  which 
can  not  be  carried  out  by  any  other  ECP.  These  algorithms  are  at  least 
as  efficient  (computation-wise)  as  other  parsing  algorithms.  A 
minimum-diatance  criterion  if  used  for  nonstochastic  models  and  a 
maximum-likelihood  criterion  is  used  for  stochastic  models  for  both 
ECP  and  NNR.  Bayes'  decision  rule  can  be  applied  when  dealing  with 
multiclass  problems  of  stochastic  models.  The  class  conditional  proba¬ 
bility  p  (x  |  Q),  where  Q  =  L(Gi),  can  be  computed  by  Algorithm  2.6. 

In  NNR,  the  distance  computation  employs  a  dynamic  programming 
procedure  which  makes  it  very  easy  for  implementation  in  VLSI  archi¬ 
tectures.  VLSI  architectures  for  ECP  and  string  distances  computation 
will  be  reviewed  in  Chapter  V.  We  also  propose  a  VLSI  architecture  for 
computing  the  string  (Levenshtein)  distance  in  Chapter  V. 
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CHAPTER  m 

APPLICATIONS  OF  SYNTACTIC  PATTERN 
RECOGNITION  TO  SEISMIC  CLASSIFICATION 

3.1  Introduction 

In  this  chapter  we  apply  syntactic  approaches  to  two  real  seismic 
classification  problems.  One  is  the  seismic  discrimination  between 
nuclear  explosion  and  natural  earthquake,  the  other  is  the  seismic 
classification  in  structural  damage  assesment.  These  waveforms  have 
been  sampled  and  digitized  before  we  obtain  the  data.  However,  vari¬ 
ous  noises  exist  in  both  cases.  Certain  prepocessing  procedures  there¬ 
fore  must  be  imposed  to  remove  those  noises.  Section  2  to  5  discuss 
application  to  seismic  discrimination,  and  Section  6  shows  application 
to  damage  assesment. 

Seismological  methods  are  so  far  the  most  effective  and  practical 
methods  for  detecting  nuclear  explosions,  especially  for  underground 
explosions.  Position,  depth  and  origin  time  of  the  seismic  events  are 
useful  information  for  discrimination;  so  are  the  body  wave  magnitude 
and  surface  wave  magnitude  of  the  seismic  wave  (Bolt,  1976;  Dahlman 
and  Israelson,  1977).  Unfortunately,  they  are  not  always  applicable  and 
reliable  for  small  events.  It  would  be  very  helpful  if  the  discrimination 
is  based  on  the  short-period  waves  alone.  The  application  of  pattern 
recognition  techniques  to  seismic  wave  analysis  has  been  studied 
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extensively  in  the  last  few  years  (Chen,  1978;  Tjostheim,  1978;  Sarna 
and  Stark,  1980).  They  all  use  short-period  waves  only  for  discrimina¬ 
tion.  Most  of  these  studies  concentrated  on  feature  selection.  Only 
simple  decision-theoretic  techniques  have  been  used.  However,  syntac¬ 
tic  pattern  recognition  appears  to  be  quite  promising  in  this  area.  It 
uses  the  structural  information  of  the  seismic  wave  which  is  very 
important  in  analysis.  Seismic  records  are  one-dimensional  waveforms. 
Although  there  exist  several  alternatives  (Ehrich  and  Foith,  1976;  San- 
kar  and  Rosenfeld,  1979)  for  representing  one-dimensional  waveforms, 
it  is  most  natural  to  represent  them  by  sentences,  i.e.,  strings  of  primi¬ 
tives.  In  order  to  make  it  easy  for  analysis  we  divide  the  pattern 
representation  procedure  into  three  steps,  namely,  pattern  segmenta¬ 
tion,  feature  selection  and  primitive  recognition,  though  they  are  corre¬ 
lated. 

In  this  chapter,  we  apply  two  different  methods  of  syntactic 
approach  to  the  recognition  of  seismic  waves.  One  uses  the  nearest- 
neighbor  decision  rule,  the  other  uses  the  error-correcting  parsing.  In 
the  first  method,  a  pattern  representation  sybsystem  converts  the 
seismic  waveforms  into  strings  of  primitives.  The  string-to-string  dis¬ 
tances  between  the  test  sample  and  all  the  training  samples  are  com¬ 
puted  and  then  the  nearest-neighbor  decision  rule  is  applied.  The 
second  method  contains  pattern  representation,  automatic  grammati¬ 
cal  inference  and  error-correcting  parsing.  The  pattern  representation 
subsystem  performs  pattern  segmentation,  feature  selection  and  primi¬ 
tive  recognition  so  as  to  convert  the  seismic  wave  into  a  string  of  primi¬ 
tives.  The  automatic  grammatical  inference  subsystem  infers  a  finite- 
state  (regular)  grammar  from  a  finite  set  of  training  samples.  The 
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error-correcting  parser  performs  syntax  analysis  and  classification. 
Human  interaction  is  required  only  at  the  training  stage,  mostly  in  pat¬ 
tern  representation  and  slightly  in  grammatical  inference. 


3.2  Preprocessing 

The  two  major  problems  in  preprocessing  of  digital  signal  is  to  iden¬ 
tify  the  appropriate  portion  for  recognition  and  to  eliminate  noise.  For 
example,  the  voiced  portion  should  be  seperated  from  the  unvoiced 
portion  in  speech  recognition;  each  ECG  cycle  should  be  determined  in 
ECG  analysis,  and  the  ’signal’  should  be  recognized  in  seismic  analysis. 
We  will  not  discuss  these  in  any  detail,  though  they  are  important.  The 
main  reason  is  the  variety  of  their  characters.  The  seismic  signals  in 
our  experiment  were  selected  from  a  huge  seismic  database.  They  all 
have  equal  length  and  have  been  aligned  at  the  onset. 

Noise  is  always  a  major  problem  in  digital  signal  processing.  Filter¬ 
ing  is  the  most  common  technique  to  remove  noise,  high-pass,  low-pass, 
band-pass,  just  to  name  a  few.  These  filters  eliminate  certain  regions  of 
frequency  component.  Sometimes  this  may  not  be  desired.  For  exam¬ 
ple,  in  Figure  3.1,  there  is  a  pulse-like  noise  within  the  seismic  signal. 
This  kind  of  noise  is  sometimes  called  glitch.  If  we  apply  the  signal 
through  a  low-pass  filter,  it  can  not  eliminate  the  pulse  completely, 
meanwhile  all  the  high  frequency  components  of  the  signal  will  also  be 
eliminated.  This  is  not  what  we  want.  To  avoid  this,  we  need  a  local 
filter  which  will  remove  only  the  pulse  noise  and  leave  the  rest  of  the 
signal  unchanged.  This  local  filtering  is  possible  because  the  normal 
signal  does  not  have  pulse  in  it,  the  local  filter  can  detect  the  pulses 


and  then  remove  them.  This  local  filtering  needs  human  interaction  to 
specify  threshold.  Different  regions  need  different  thresholds.  We  can 
see  from  Figure  3.1  that  the  whole  signal  can  be  divided  into  three  por¬ 
tions.  The  relatively  flat  portion  at  the  beginning  is  the  background 
noise,  which  should  not  be  confused  with  the  noise  we  want  to  elim¬ 
inate.  The  next  portion  has  the  strongest  signal  which  is  called  the  sig¬ 
nal  portion.  After  the  strong  signal  portion  is  the  weak  signal  portion 
which  is  called  coda.  A  point  i  is  said  to  be  a  pulse  noise  if  and  only  if  it 
satisfies  the  following  two  conditions: 

(1)  absolute  magnitude  of  point  i,  |a(i)|,  is  greater  than  or  equal  to 
the  threshold. 

(2)  absolute  value  of  a(i  +  l)  +  a(i  —  1)  -  2  *  a(i)  is  greater  than  or 
equal  to  the  threshold. 

The  second  condition  seperates  the  pulse  noise  from  strong  signal  por¬ 
tion  since  the  pulse  noise  is  much  sharper.  After  point  i  is  detected  to 
be  a  pulse  noise,  it  can  be  eliminated  by  letting 
a(i)  =  (a(j)  +  a(k ))  /  2 

where  j  <  i,  k  >  i,  point  j  and  k  are  not  pulse  noise  and  no  point 
between  j  and  k  is  normal  signal  point. 

Figure  3.1(a)  is  a  signal  before  filtering,  (b)  is  the  same  signal  after 
filtering.  Figure  3.2  is  another  example,  but  it  has  more  than  one  pulse 
noise.  From  these  two  examples  we  can  see  the  local  filter  works  suc¬ 
cessfully  in  eliminating  the  local  pulse  noise  while  retaining  the  original 
signals. 

Another  noise  problem  of  seismic  signal  is  the  drift  during  record¬ 
ing.  As  can  be  seen  from  Figure  3.3(b),  the  whole  signal  is  somewhat 
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below  the  zero  line,  especially  the  beginning  portion  which  is  far  below 
the  zero  line.  In  order  to  retain  the  details  of  the  original  signal,  we  use 
a  low  order  polynomial  regression  of  the  original  signal  and  then  sub¬ 
tract  this  polynomial  regression  from  the  original  signal.  The  fitness  of 
the  regression  is  tested  by  least-squares  criterion.  We  use  a  5th-order 
polynomial  regression  for  the  seismic  signals.  The  regression  program 
is  taken  from  the  book  by  Carnahan,  Luther  and  Wilkes  (1969).  The 
entire  procedure  consists  of  two  parts,  i.e.,  global  adjustment  and  local 
adjustment.  In  global  adjustment,  the  polynomial  regresssion  is  applied 
to  the  whole  signal  and  then  followed  by  subtraction.  Figure  3.3(c)  is 
the  result  after  the  regression  and  subtraction  from  Figure  3.3(b).  We 
can  see  that  the  small  segment  at  the  beginning  still  drifts  from  the 
zero  line  slightly.  Then  we  apply  regression  and  subtraction  to  this 
small  segment;  this  is  called  local  adjustment.  The  result  afler  local 
adjustment  is  shown  in  Figure  3.3(d).  Another  example  is  shown  in  Fig¬ 
ure  3.4.  Figure  3.4(a)  is  the  original  signal,  (b)  is  the  original  signal  with 
the  zero  line.  We  can  see  that  the  first  portion  of  this  signal  drift  above 
the  zero  line  and  the  rest  of  the  signal  drifts  below  the  zero  line.  Figure 
3.4(c)  is  the  result  after  global  adjustment  and  (d)  is  the  result  after 
local  adjustment.  The  sequence  of  applying  global  adjustment  first  and 
then  local  adjustment  is  important.  If  we  reverse  the  order,  it  will  not 
produce  the  same  result  as  we  have  otherwise.  In  our  present  experi¬ 
ment  the  segment  for  local  adjustment  is  selected  manually.  One  alter¬ 
native  is  to  use  piece-wise  regression  to  select  the  optimal  breaking 
point.  This  is  carried  out  by  breaking  the  whole  signal  into  two  seg¬ 
ments  and  then  finding  regression  of  each  segment.  The  breaking  point 
which  results  in  miminum  deviation  is  the  optimal  breaking  point.  This 
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must  be  done  on  a  section  of  contiguous  points.  It  is  time  consuming 
and  therefore  is  excluded  from  our  experiment.  After  the  above 
preprocessing  procedures  we  can  perform  segmentation  and  primitive 
selection. 


3.3  Automatic  Clustering  Procedure 
for  Primitive  Selection 

It  has  been  mentioned  in  Fu  (1982)  that  the  pattern  primitives 
should  serve  as  basic  pattern  elements  in  describing  the  structural 
relations  and  they  should  be  easily  extractable,  usually  by  nonsyntactic 
methods.  The  selection  of  primitives  depends  largely  on  the  type  of 
waveforms.  In  some  applications,  the  primitives  are  prespecified  by 
human  expert,  e.g.,  in  Giese,  et  al.  (1979).  We  would  like  to  investigate 
the  possibility  of  nonsupervised  learning  in  primitive  selection,  there¬ 
fore,  we  use  an  automatic  clustering  procedure  to  select  the  pattern 
primitives.  This  is  important  because  human  selection  of  pattern  prim¬ 
itive  may  not  always  be  available,  besides,  it  may  be  unreliable. 

3.3.1  Pattern  Segmentation 

A  digitized  waveform  to  be  processed  by  a  digital  computer  is  usu¬ 
ally  sampled  from  a  continuous  waveform  which  represents  the 
phenomena  of  a  source  plus  external  noise.  For  some  cases,  such  as 
ECG  and  carotid  pulse  wave  analysis  (Horowitz,  1975;  Stockman,  et  al., 
1976),  every  single  peak  and  valley  are  significant,  therefore  these 
waveforms  can  be  segmented  according  to  the  shape.  For  others,  like 
EEG  (Giese,  et  al.,  1979)  and  seismic  wave  analysis  in  our  case,  a  single 
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peak  or  valley  does  not  contain  significant  information,  especially  when 
the  signal  to  noise  ratio  is  low,  therefore  they  should  be  segmented  by 
length,  either  a  fixed  length  or  a  variable  length.  A  variable-length  seg¬ 
mentation  is  more  efficient  and  precise  in  representation,  but  it  is  usu¬ 
ally  very  difficult  and  time  consuming  to  find  an  appropriate  segmenta¬ 
tion.  A  fixed-length  segmentation  is  much  easier  to  implement.  If  the 
length  is  well  selected  it  will  be  adequate  to  represent  the  original 
waveform.  There  is  a  compromise  between  the  representation  accu¬ 
racy  and  analysis  efficency.  The  shorter  the  segmentation  is,  the  more 
accurate  the  representation  will  be.  But  the  analysis  becomes  more 
inefficient  since  the  string  is  longer  and  the  computation  time  is  pro¬ 
portional  to  string  length.  Another  problem  is  the  noise.  If  the  segmen¬ 
tation  is  too  short,  it  will  be  very  sensitive  to  noise. 

Pattern  segmentation  is  closely  related  to  primitive  selection.  The 
segment  length  in  speech  analysis  is  20  milliseconds  (DeMori,  1972, 
1977),  and  1  second  in  EEG  analysis  (Giese,  et  al.,  1979).  For  short- 
period  seismic  signal,  a  segment  length  of  around  6  seconds  is  a  good 
choice.  A  segment  of  this  length  contains  adequate  information  and  has 
been  used  by  many  other  researchers  (Chen,  1978;  Tojstheim,  1975). 
Since  the  sampling  frequency  of  our  data  set  is  10  Hz,  a  6-second  period 
contains  60  points. 

We  have  done  experiments  on  other  segment  lengths,  they  are  40 
points  and  80  points.  We  selected  41  explosion  records  out  of  111  and 
59  earthquake  records  out  of  210  as  training  samples.  The  recognition 
result  for  60-point  segment  length  is  91.0%,  i.e.,  20  misclassifications 
out  of  221.  When  we  chose  40  points  as  segment  length,  according  to 
the  primitive  selection  procedure  in  Section  3,4  the  best  selecton  for 
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primitive  number  is  18.  For  primitive  number  18,  the  recognition 
result  is  72.9%,  i.e.,  60  misclassifications  out  of  221.  If  we  chose  primi¬ 
tive  number  13  as  we  did  in  60-point  segment  length,  the  recognition 
result  is  still  72.9%,  though  the  detail  of  classification  is  different.  When 
we  chose  80  points  as  segment  length,  the  primitive  number  selection  is 
14  and  the  recognition  result  is  73.8%,  i.e.,  58  misclassifications  out  of 
221. 

Although  this  experiment  is  by  no  means  conclusive,  it  does  show 
that  a  segment  length  of  60  points  is  an  appropriate  selection  for 
short-period  seismic  signal.  A  shorter  segment  is  too  sensitive  to  noise 
and  a  longer  segment  is  too  complicated  for  a  primitive.  The  selection 
of  segment  length  is  ususlly  a  subjective  judegment  and  depends  on  the 
characteristic  of  the  signal  waveform. 

3.3.2  Feature  Selection 

Any  linear  or  nonlinear  mapping  of  the  original  measurements  can 
be  considered  as  features  provided  they  have  discriminating  capability. 
Both  time-domain  features  and  frequency-domain  features  have  been 
used  for  seismic  discrimination  .  For  example,  complexity  and  autore¬ 
gressive  models  are  features  in  time  domain;  spectral  ratio  and  third 
moment  of  frequency  are  features  in  frequency  domain  (Dahlman  and 
Israelson,  1977).  Since  we  segment  the  seismic  wave,  complexity  and 
spectral  ratio  features  are  implicitely  contained  in  the  string  structure. 
Furthermore,  the  segment  may  be  too  short  for  a  model  estimation  if 
we  use  shorter  segment.  Therefore,  we  selected  a  pair  of  commonly 
used  features,  i.e.,  zero  crossing  count  and  log  energy  of  each  segment, 
which  are  easy  to  compute  and  contain  significant  information.  Easy  to 
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compute  is  a  desired  property  for  primitive  extraction  in  syntactic 
approach.  Zero  crossing  count  roughly  represents  the  major  frequency 
component  of  the  signal  and  log  energy  indicates  the  magnitude  of  the 
signal.  These  two  features  should  be  able  to  characterize  the  signal 
segment.  Other  features  may  also  serve  as  good  candidates.  An  advan¬ 
tage  of  syntactic  approach  is  that  feature  selection  is  simpler  since 
features  are  extracted  from  smaller  segments,  and  feature  selection  is 
not  that  critical  as  is  in  statistical  approach.  Since  there  is  no  optimal 
feature  selection  algorithm,  features  are  usually  subjectively  selected. 
Although  there  are  criteria  such  as  between  cluster  and  within  cluster 
scatterness,  they  have  no  direct  relation  to  final  recognition  results. 
While  other  features,  including  K—L  expansion,  do  not  show  any 
superiority  in  recognition  results  in  our  preliminary  experiments,  we 
will  stick  to  the  zero  crossing  count  and  log  energy. 

Since  we  are  experimenting  a  new  approach  for  seismic  discrimina¬ 
tion,  we  do  not  particularly  emphasize  feature  selection.  In  fact,  simple 
features  like  these  give  favorable  result  in  our  experiment.  This  indi¬ 
cates  that  syntactic  approach  utilizes  structural  information  instead  of 
sophisticated  feature  measurement. 

3.3.3  Primitive  Recognition 

The  selection  of  primitives  varies  very  largely  in  digital  signal 
recognition.  Line  segments  from  linear  approximation  of  signals  have 
been  used  in  ECG  analysis  (Horowitz,  1975,  1977).  Parabola  and  line 
segment  have  been  used  in  carotid  pulse  wave  analysis  (Stockman,  et 
al.,  1976).  These  primitives  are  mainly  used  to  describe  the  shape  of 
the  signal  waveform.  When  the  shape  of  the  signal  waveform  is  not 
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important,  other  types  of  primitives  must  be  selected.  For  example  in 
spoken  word  recognition  (DeMori,  1972,  1977),  silence  interval,  stable 
zone  and  lines  are  used  as  primitives.  In  EEG  analysis  (Giese,  et  al., 
1979),  a  group  of  seven  primitives  has  been  specified  and  a  linear 
classifier  is  used  to  recognize  the  testing  segments.  What  should  we  do 
if  the  signal  on  hand  is  not  as  predictable  as  speech  signal,  nor  can  we 
specify  the  primitives  as  in  EEG  analysis.  One  possible  solution  is  by 
clustering  procedure.  A  clustering  procedure  will  classify  any  number 
of  signal  segments  into  certain  number  of  clusters  in  an  optimal  way, 
which  means  minimization  of  some  criterion  function. 

If  the  number  of  primitives,  i.e.,  the  number  of  clusters,  has  been 
selected  then  any  typical  clustering  technique,  e.g.,  A'-means  algo¬ 
rithm,  can  find  the  optimal  clustering.  Now  the  difficult  part  is  how  to 
select  an  appropriate  primitive  number.  For  example  in  EEG  analysis, 
how  do  we  know  seven  is  the  best  selection.  Is  there  any  other  better 
selection?  How  does  the  selection  of  primitive  number  affect  the  final 
recognition  results?  We  will  discuss  all  of  these  questions  in  this  sec¬ 
tion. 

Without  lost  of  generality  we  assume  that  each  signal  segment  is 
represented  by  a  vector  of  features  x  =  [xlt  x2,  ....  zfc]*.  It  is  noted  that 
we  use  decision-theoretic  approach  for  primitive  selection.  Other 
representations  may  also  serve  the  purpose  as  long  as  the  similarity 
between  signal  segments  can  be  computed.  If  the  feature  space  is  iso¬ 
tropic,  then  the  Euclidean  distance  can  be  used  as  a  measure  of  simi¬ 
larity  and  it  is  invariant  under  translation  or  rotation.  However,  the 
invariance  can  be  attained  by  normalizing  the  data  before  clustering. 
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Suppose  we  want  to  partition  n  samples  x1,  x2 x 71  into  k  dis¬ 
joint  subsets  C j,  Cz,  ....  Ck.  Each  subset  represents  a  cluster.  The  sam¬ 
ples  in  the  same  cluster  are  more  similar  than  the  samples  in  different 
clusters.  One  typical  approach  is  to  define  a  criterion  function  that 
measures  the  clustering  quality  of  any  partition  of  the  samples.  Then 
the  problem  is  to  minimize  of  maximize  the  criterion  function.  One  of 
the  most  well-known  criterion  function  is  the  sum-of-squared-error  cri¬ 
terion  (Duda  and  Hart,  1973).  Let  7i*  be  the  number  of  samples  in  clus¬ 
ter  C*  and  in*  be  the  mean  of  those  samples,  where 


£ 

2CCf 


X 


The  sum-of-squared-error  criterion  is  defined  as 
J,  =  £  £  |  |  x  -  m*  |  | 2 

t  =  l  xeCj 


Another  set  of  criterion  functions  are  derived  from  scatter  matrice. 
First,  let  us  introduce  some  definitions. 

Mean  vector  for  ith  cluster: 


£ 

zeC{ 


X 


Total  mean  vector: 

77i  =  x  =  £  tt*  m* 

71  C  n  i  =  1 

Scatter  matrix  for  ith  cluster: 
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5 *  =  £  (z  -  m^ix  -  m^Y 
*eCj 

Within-cluster  scatter  matrix: 

Sw  =  E  St 
1  =  1 

Be  tween-cluster  scatter  matrix: 
k 

S ~B  =  Y,  ni  (mt  -  -  m)* 

i  =  1 

Total  scatter  matrix: 

•SY  =  E  (x  —  m)(x  —  m)* 

X  EC 

It  follows  obviously  that  Sf  -  S#  +  Sg 

We  define  the  optimal  partition  as  one  that  minimizes  Sw  or  max¬ 
imizes  SB.  In  doing  so  we  need  a  scalar  measure  of  the  size  of  a  scatter 
matrix.  The  trace  of  S *  is  the  simplest  measures.  Other  well-known 
measures  a^e  the  determinant  of  S w  and  the  trace  of  S^Sg.  For  the 
sake  of  computational  simplicity  we  will  only  consider  the  trace  of  S* 
as  criterion  function.  The  trace  criterion  is  defined  as: 

&  Sy,  =  £  tr  si  ~  E  E  I  I*  “  "k  I  I2  =  J* 

t  =  1  t  =  l  xeCi 

which  is  exactly  the  same  as  the  sum-of-squared-error  criterion.  Since 
lr  Sf  =  tr  SB  +  tr  Sw  and  fr  S?  is  independent  of  how  the  samples  are 
partitioned,  therefore  minimizing  tr  Sfr  is  equivalent  to  maximizing 
tr  Sg.  Where 


If  the  number  of  cluster  is  known,  then  the  iT-means  algorithm  can 
be  applied  to  find  a  clustering  which  minimizes  the  criterion  function, 
i.e.,  the  sum-of-squared-error  Ja.  When  the  number  of  clusters  is  unk¬ 
nown,  at  least  two  approaches  can  be  used  to  determine  the  optimal 
cluster  number.  These  two  approaches  turn  out  to  have  similar  results 
in  our  experiment. 

Both  approaches  use  a  bottom-up  hierarchical  clustering  pro¬ 
cedure.  This  algorithm  repeats  the  clustering  procedure  for  k  =  U ,  k  = 

U  -  1  . k  =  L,  where  U  and  L  are  the  specified  upper  and  lower  bound 

respectively.  The  first  approach  selects  the  optimal  cluster  number  by 
examining  how  the  criterion  function  Je  changes  with  k .  If  these  n 
samples  are  really  grouped  into  p  well  seperated  clusters,  then  «/e 
should  increase  slowly  until  k  =  p  and  then  increase  much  more  rapidly 
thereafter.  The  algorithm  for  bottom-up  clustering  procedure  is  shown 
as  follows: 

Algorithm  3.1  Bottom-Up  Hierarchical  Clustering 

Input :  A  set  of  n  unclassified  samples,  an  upper  bound  U 
and  a  lower  bound  L . 

Output:  A  sequence  of  optimal  clusterings  for  the  number 
of  clusters  between  U  and  L. 

Method: 

(1)  Let  k  ~  U ,  k  is  the  numDer  of  clusters,  and  arbitrarily 
assign  cluster  membership. 

(2)  Reassign  membership  using  /T-means  algorithm.  If 


k^L,  stop. 

(3)  Find  the  nearest  pair  of  clusters,  say  Ct  and  Cj,  i  p  j . 

(4)  Merge  Ct  and  Cj,  delete  Cj  and  decrease  k  by  one, 
go  to  step  2. 

The  distance  between  two  clusters  is  defined  by 
d(CitCj)  =  |  | mi  -  mj  \  \ 

where  mi,  m;-  are  the  mean  vectors  of  clusters  i,j  respectively. 

Just  as  F-statistics  can  be  used  in  univariate  case  to  test  the 
significance  of  group  seperation,  a  pseudo  F-statistics  (PFS)  can  be 
applied  in  multivariate  case  provided  that  a  single  measurement  of 
similarity  between  samples,  e.g.,  Euclidean  distance,  is  assumed  (Vogel 
and  Wong,  1978).  A  pseudo  F-statistics  is  defined  as: 


PFS  = 


tr  Sb  (n  “  k) 
tr  Sw  (k  -  1) 


As  the  number  of  clusters  increases,  tr  S&  will  always  increase  while 
tr  Sfy  will  always  decrease.  However,  the  PFS  value  will  not  monotoni- 
cally  increase  due  to  the  effect  of  ( n—k )  /  (k  —  1)  which  is  smaller  as  k 
becomes  larger.  Therefore,  there  will  be  a  peak  of  PFS  value  some¬ 
where  in  the  middle.  Since,  like  F-statistics,  the  PFS  shows  the 
significance  of  group  seperation,  therefore  a  larger  PFS  value  means 
the  clusters  are  more  compact  and  well  seperated.  The  crietrion  here 
is  to  select  the  maximum  PFS  value;  the  corresponding  cluster  number 
will  be  optimal.  For  example,  in  Figure  3.7,  the  maximum  PFS  value 
appears  at  cluster  number  13,  therefore  13  is  the  optimal  selection  for 
cluster  number. 
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3.4  Syntax  Analysis 

If  the  classification  is  all  we  need,  then  the  nearest-neighbor  deci¬ 
sion  rule  is  preferred  because  of  its  computational  efficiency.  On  the 
other  hand,  if  a  complete  description  of  the  waveform  structure  is 
needed,  we  have  to  use  parsing  (or  error-correcting  parsing).  An  error- 
correcting  parser  (instead  of  conventional  parser)  is  required  for  most 
practical  pattern  recognition  applications.  Since  noise  and  distortion 
usually  cause  conventional  parsers  to  fail.  It  is  not  unusual  that  even  a 
noise-free,  distortion-free  pattern  can  not  be  recognized  by  a  conven¬ 
tional  parser,  since  the  pattern  grammar  is  often  inferred  from  a  small 
set  of  training  samples. 

3.4. 1  Nearest-Neighbor  Decision  Rule 

The  concept  of  nearest-neighbor  decision  rule  in  syntactic 
approach  is  similar  to  that  in  decision-theoretic  approach.  The  only 
difference  is  in  distance  calculation.  Four  types  of  string  distances 
have  been  discussed  in  chapter  two,  and  they  can  be  computed  using 
dynamic  programming  method  (e.g.,  Algorithm  2.1). 

3.4.2  Error-Correcting  Finite-State  Parsing 

Before  parsing  can  take  place  we  must  have  a  grammar,  which  can 
be  either  heuristically  constructed  or  inferred  from  a  set  of  training 
samples.  In  order  to  study  the  learning  capability  of  the  syntactic 
method,  we  choose  the  grammatical  inference  approach. 

Phrase  structure  grammars  have  been  used  to  describe  patterns  in 
syntactic  pattern  recognition  (see  Fu,  1932).  Each  pattern  is 
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represented  by  a  string  of  primitives  which  corresponds  to  a  sentence 
in  a  language  (tree  or  graph  in  high  dimensional  grammars).  All  strings 
which  belong  to  the  same  class  are  generated  by  one  grammar. 

Grammatical  Inference 

A  set  of  sentences  S*  is  a  positive  sample  of  a  language  L{G),  if  S+ 
C  L(G).  A  set  of  sentences  5“  is  a  negative  sample  of  a  language  L{G), 
if  S~  C  ZTC7- 

A  positive  sample  S+  of  a  language  L(G)  is  structurally  complete  if 
each  production  in  G  is  used  in  the  generation  of  at  least  one  string  in 
S+  (Fu  and  Booth,  1975). 

We  assume  that  the  set  S *  is  structurally  complete  and  S *  C 
L(GD),  where  GD  is  the  inferred  grammar.  Theoretically,  if  S *  is  a 
structurally  complete  sample  of  the  language  L(G)  generated  by  the 
finite-state  grammar  G  then  the  canonical  grammar  Gc  can  be  inferred 
from  S +.  A  set  of  derived  grammars  can  be  derived  from  Gc.  The 
derived  grammars  are  obtained  by  partitioning  the  set  of  nonterminals 
of  the  canonical  grammar  into  equivalence  classes.  Each  nonterminal  of 
the  derived  grammar  corresponds  to  one  block  of  the  partition.  Since 
the  number  of  possible  partitions  is  too  large  it  is  infeasible  to  evaluate 
all  the  partitions.  Therefore  some  algorithms  such  as  A: -tail  algorithm 
(Biermann  and  Feldman,  1972)  has  been  suggested  to  reduce  the 
number  of  derived  grammars.  These  algorithms  have  one  disadvantage. 
The  reduced  subset  of  derived  grammars  may  not  contain  the  source 
grammar.  However,  it  will  be  sufficient  if  we  only  interest  in  an  estimate 
of  the  source  grammar.  There  are  at  lea^-t  two  situations  where  a  gram¬ 
matical  inference  algorithm  can  be  used.  In  the  first  case  there  exists 
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a  source  grammar  which  generates  a  language  and  we  want  to  infer  the 
source  grammar  or  automaton  based  on  the  observed  samples.  In  the 
second  case  the  exact  nature  of  the  source  grammar  is  unknown,  the 
only  information  we  have  are  some  sentences  generated  by  the  source. 
We  assume  that  the  source  grammar  falls  into  a  particular  class  and 
infer  a  grammar  which  generates  all  the  training  samples,  and  hope¬ 
fully  will  generate  some  samples  belonging  to  the  same  class.  If  a  nega¬ 
tive  sample  set  is  given,  the  inferred  grammar  must  not  generate  any 
sample  in  the  negative  sample  set.  Grammars  more  complex  than 
finite-state  grammars  and  restricted  context-free  grammars  (in  Chom¬ 
sky  hierarchy)  can  not  be  inferred  efficiently  without  human  interac¬ 
tion.  Furthermore,  there  exists  no  obvious  ^elf-embedding  property  in 
seismic  waves,  finite-state  grammars  will  be  sufficient  in  generating 
power.  Therefore  we  choose  finite-state  grammars  to  describe  the 
seismic  waves. 

The  inference  of  regular  grammars  has  been  studied  extensively. 
The  k  -tail  algorithm  finds  the  canonical  grammar  and  then  merges  the 
states  which  are  fc-tail  equivalent.  This  algorithm  is  adjustable,  the 
value  of  k  controls  the  size  of  the  inferred  grammar.  Another  algorithm 
called  tail-clustering  algorithm  (Miclet,  1980)  also  finds  the  canonical 
grammar,  but  then  merges  the  states  which  have  common  tails.  The 
original  algorithm  is  not  as  flexible  as  the  fc-Lail  algorithm,  but  will  infer 
a  grammar  which  is  closer  to  the  source  grammar  in  some  cases.  We 
can  modify  the  merge  criterion  to  make  it  more  flexible.  Since  the 
grammar  is  inferred  from  a  small  set  of  training  samples,  we  can  only 
expect  that  the  inferred  grammar  generates  all  the  training  samples 
and  will  generate  other  strings  which  are  similar  to  the  training 
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samples.  The  generating  power  of  the  inferred  grammar  relies  entirely 
on  the  merge  procedure.  If  no  merge  occurs  at  all,  then  the  inferred 
grammar  generates  exactly  the  same  training  set,  no  more  no  less. 
Since  all  the  seismic  records  have  the  same  length  and  alignment  in 
our  experiment,  the  sentences  representing  these  signals  also  have  the 
same  length. 

Error-Correcting  Parsing 

After  a  grammar  is  available,  either  by  automatic  inference  or  by 
manaul  construction,  the  next  step  is  to  design  a  recognizer  which  will 
recognize  the  patterns  generated  by  the  grammar.  If  the  grammar  G  is 
finite-state,  a  deterministic  finite-state  automaton  can  be  constructed 
to  recognize  the  strings  generated  by  G. 

Segmentation  and  primitives  recognition  errors  due  to  noise  and 
distortion  usually  occur  in  practice.  Conventional  parsing  algorithms 
can  not  handle  thene  situations,  therefore,  an  error-correcting  parser 
must  be  used  (Fu,  1977). 

Since  all  the  sentences  in  our  example  have  the  same  length,  only 
the  substitution  error  needs  to  be  considered.  For  each  production  A  -» 
a B  and  A  -*  a  in  the  original  grammar  we  add  A  -*  bB  and  A  -»  b 
respectively  to  the  covering  grammar,  where  A,B  e  N,  a,b  e  2,  l  A  a, 
N  is  a  set  of  nonterminal  symbols  and  2  is  a  set  of  terminal  symbols. 
Different  weights  can  be  assigned  to  different  error  productions,  there¬ 
fore,  result  in  a  minimum-cost  error-correcting  parser.  The  assignment 
of  weights  is  a  crucial  problem.  We  have  used  the  distance  between 
clusters  a  and  6  as  the  weight  for  substituting  a  by  6  and  vise  versa. 
Since  a  finite-state  grammar  can  be  represented  by  a  transition 


diagram.  Thus,  a  minimum-cost  error-correcting  parsing  is  equivalent 
to  finding  a  minimum-cost  path  from  the  initial  state  to  a  final  state. 

Algorithm  3.2.  Computation  of  Minimum-Cost 

Input:  A  transition  diagram  with  n  nodes  numbered  1,  2,  ....  n , 
where  node  1  is  the  initial  state  and  node  n  is  a  final  state, 
and  a  cost  function  Cy(a),  for  j'<n,  a  e  £,  with  Qy(a) 

&  0,  for  all  i  and  j .  An  input  string  s . 

Output:  mln  the  lowest  cost  of  any  path  from  node  1  to  node  n 
whose  sequence  is  equal  to  that  of  the  input  string  s . 

Method: 

(1)  Set  k  -  1. 

(2)  For  all  lsjisn,  mxj  =  min  +  Ckj(b),  for  all  lsfcsnj,  where 

6  is  the  fcth  symbol  of  input  string  s . 

(3)  If  k  <  |s  |,  increase  k  by  1  and  go  to  step  (2).  If  k  -  [s  go  to 
step  (4). 

(4)  Output  mln,  which  is  the  lowest  cost  from  node  1  to  node  n  fol¬ 
lowing  the  move  of  input  string  s.  Stop. 

Cost  function  £?y(a)  denotes  the  cost  of  moving  from  state  i  to 
state  j  while  the  input  symbol  is  'a' .  mXj  is  the  minimum  cost  from 
state  1  to  state  j .  The  computation  time  of  Algorithm  3.2  is  linear,  i.e., 
0(n),  where  n  is  the  length  of  the  input  string.  This  algorithm  is  a 
finite-state  parsing  algorithm  where  only  substitution  error  is  con¬ 
sidered.  The  production  number  can  be  stored  with  Ci}- (a),  and  the 


parse  can  be  stored  with  m  1;- 


If  insertion  and  deletion  errors  are  to  be  considered,  then  the 
parser  is  still  similar  except  that  we  have  to  compute  and  store  the 
information  V(T,  S,  a)  which  is  the  minimum  cost  of  changing  character 
’a’  into  some  string  which  can  change  the  state  of  the  automaton  from 
state  T  to  S  (Wagner,  1974).  The  inclusion  of  insertion  and  deletion 
errors  makes  the  error  correction  more  complete,  but  assigning 
appropriate  weights  to  insertion  and  deletion  error  is  even  more 
difficult. 

3.5  Experimental  Results  on  Seismic  Discriminate 

The  seismic  data  used  in  our  experiments  are  provided  by  Profes¬ 
sor  C.  H.  Chen  of  Southeastern  Massachusetts  University.  The  data 
were  recorded  at  LASA  in  Montana.  Each  record  contains  1200  points; 
the  sampling  frequency  is  10  points  per  second.  The  original  data  con¬ 
tains  323  records.  Due  to  some  technical  problems  in  data  conversion 
only  321  records  were  received.  Among  them  111  records  are  nuclear 
explosions  and  210  records  are  earthquakes. 

We  have  selected  forty-one  earthquake  records  and  fifty-nine  explo¬ 
sion  records  as  training  samples.  Each  record  is  divided  into  20  seg¬ 
ments  where  each  segment  contains  60  points.  Two  features,  i.e.,  zero¬ 
crossing  count  and  log  energy,  are  computed  from  each  segment. 
Table  3.1  shows  the  criterion  function  </e  and  its  increment  from  cluster 
number  16  down  to  2,  which  are  the  results  of  applying  Algorithm  3.1  to 
the  training  segments.  We  can  see  that  the  increment  of  Je  is  small 
before  and  until  cluster  number  is  equal  to  13  and  then  becomes  much 
larger  thereafter.  Therefore,  we  say  that  13  is  an  optimal  selection  of 
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TABLE  3.1 


The  criterion  function,  increments  of  criterion 
function  and  the  classification  results  of 
different  cluster  number  selections 


Cluster 

No. 

Criterion 

function 

Increment 
of  c.  f. 

Classif. 

% 

16 

359 

- 

80.1 

15 

374 

15 

81.9 

14 

392 

18 

85.5 

13 

416 

14 

91.0 

12 

456 

40 

84.6 

11 

510 

54 

83.7 

10 

565 

55 

85.5 

9 

632 

67 

81.9 

8 

698 

66 

76.5 

7 

783 

85 

68.8 

6 

899 

116 

57.9 

5 

1069 

170 

64.3 

4 

1360 

291 

57.9 

3 

1756 

396 

- 

2 

2464 

708 

- 

T 


cluster  number.  Also  shown  in  Table  3.1  are  the  recognition  results  for 
different  cluster  number  selections.  The  number  of  clusters  is 
equivalent  to  the  number  of  primitives.  The  selection  of  13  clusters 
gives  the  best  recognition  result.  The  tr  Sg  curve  which  is  monotoni- 
cally  increaseing  is  shown  in  Figure  3.5,  and  the  tr  S #  curve  which  is 
monotonically  decreasing  is  shown  in  Figure  3.6.  The  PFS  curve  is 
shown  in  Figure  3.7.  The  maximum  PFS  value  appears  at  cluster 
number  13,  which  is  identical  to  the  selection  in  the  previous  approach. 

Although  there  is  a  secondary  peak  at  cluster  number  6  in  Figure 
3.7,  this  one  does  not  have  any  significant  meaning.  The  recognition 
results  of  Table  3.1  show  no  indication  of  peak  at  that  location.  How¬ 
ever,  there  does  exist  a  secondary  peak  in  recognition  accuracy  which 
occurs  at  cluster  number  10.  The  possible  reasons  for  these 
phenomena  are  that  first,  our  seismic  samples  are  not  very  compact 
and  well  seperated;  and  second,  we  reassign  membership  after  each 
merging,  this  may  affect  the  PFS  value  and  recognition  results.  In  spite 
of  the  secondary  peak,  the  selection  of  the  dominant  peak  gives  the 
best  results  and  should  be  the  rule  to  follow. 

The  centers  of  the  13  clusters  and  the  number  of  members  in  each 
cluster  are  shown  in  Table  3.2.  The  cluster  centers  are  further  plotted 
in  the  two-dimensional  feature  plane  in  Figure  3.0.  Portions  (17  seg¬ 
ments)  of  two  examples,  one  is  a  typical  explosion;  the  other  is  a  typical 
earthquake,  are  given  in  Figure  3.9,  which  have  both  original  waveforms 
and  string  representations.  The  second  segments  of  the  two  waveforms 
look  the  same  but  have  different  primitive  assignment.  This  is  because 
both  symbol  ’a'  and  ’c  ’  have  very  small  magnitudes  compared  with  the 
other  symbols  (see  Figure  3.0).  therefore  the  freouency  difference  can 


RD-A124  298  A  SYNTACTIC  APPROACH  AND  VLSI  ARCHITECTURES  FOR  SEISMIC  2/3 
SIGNAL  CLASSIFICATIONS)  PURDUE  UNIV  LAFAVETTE  IN 
SCHOOL  OF  ELECTRICAL  ENGINEERING  H  LIU  ET  AL.  JAN  83 
UNCLASSIFIED  N88814-79-C-0574  F/G  8/11  NL 


do.  of  clusters 


Figure  3.5  tr  Sb  increases  as  the  number  of  clusters  increases 


Figure  3.6  tr  S v  decreases  as  the  number  of  clusters  increases. 


TABLE  3.2 


The  center  of  the  13  clusters,  the  number 
of  members  in  each  cluster  and  the  primitive 
symbol  of  each  cluster. 


Cluster 

No. 

Feature  1 
(Z-C  C.) 

Feature  2 
(L.  E.) 

No.  of 
Members 

Primitive 

Symbol 

1 

-1.718192 

-2.108372 

67 

a 

2 

3.336939 

-1.740116 

36 

b 

3 

-.180208 

-2.387472 

43 

c 

4 

-1.229273 

.987182 

187 

d 

5 

.467317 

1.048923 

179 

e 

6 

.426978 

.113834 

233 

f 

7 

-.407192 

1.283638 

209 

e 

8 

-.320940 

.440148 

245 

h 

9 

1.431115 

.168968 

73 

i 

10 

-.306735 

-.573480 

211 

j 

11 

1.485801 

-.940290 

145 

k 

12 

-1.413536 

-.255781 

116 

1 

13 

.476520 

-.756842 

256 

m 

not  be  seen  due  to  the  resolution  of  the  drawing.  Algorithm  2.1  is 
applied  for  string  distance  computation,  and  the  nearest-neighbor  deci¬ 
sion  rule  is  used  for  classification.  Since  all  the  records  have  equal 
length  and  alignment,  only  substitution  errors  are  considered.  The 
■weights  for  substitution  errors  are  given  in  Table  3.3.  The  weight 
between  pattern  primitives  is  defined  as  the  normalized  distance 
between  corresponding  clusters.  Classification  results  and  computation 
time  of  the  221  test  samples  are  shown  in  Table  3.4  where  201  records 
are  correctly  classified,  i.e.,  91%  correct  rate,  with  an  average  time  of 
0.07  sec  for  each  record.  The  experiments  were  run  on  a  VAX  11/780 
computer  using  Pascal  programming  language. 

We  use  the  A: -tail  finite-state  inference  algorithm  to  infer  pattern 
grammars  for  the  seismic  waves.  When  k  &  19,  the  inferred  grammar  is 
exactly  the  same  as  the  canonical  grammar.  When  k  <19,  some 
equivalent  states  will  be  merged,  therefore,  result  in  fewer  number  of 
states  and  productions.  The  number  of  states  and  productions  for  vari¬ 
ous  values  of  fc  is  shown  in  Table  3.5;  it  is  getting  smaller  as  k  gets 
smaller.  Average  parsing  time  of  one  string  and  percentage  of  correct 
classification  for  different  k  are  given  in  Table  3.6.  The  parsing  time  is 
shorter  when  k  is  smaller.  This  is  due  to  the  smaller  number  of  produc¬ 
tions  and  states.  On  the  other  hand,  the  correct  perentage  is  also 
smaller  when  k  is  smaller.  This  is  because  derived  grammars  generate 
strings  which  do  not  belong  to  the  positive  sample  set.  Another  reason 
of  worse  performance  is  that  in  our  case  only  those  states  with  longest 
tails  are  merged.  In  terms  of  transition  diagram,  this  means  only  those 
states  which  are  close  to  the  initial  state  are  merged.  Because  the  /c- 
tails  of  those  states  are  empty,  and  only  are  they  k -equivalent.  This  is 
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TABLE  3.3 


Weights  for  substitution  error 


a  b 

c 

B 

e  f 

g 

h 

i 

j 

k 

1 

m 

a 

0  0.95 

0.29 

0.59 

0.72  0.58 

0.68 

0.55 

0.73 

0.39 

0.64 

0.35 

0.49 

b 

0.95  0 

0.67 

1.00 

0.75  0.65 

0.91 

0.80 

0.51 

0.72 

0.38 

0.94 

0.57 

c 

0.29  0.67 

0 

0.67 

0.66  0.49 

0.69 

0.53 

0.57 

0.34 

0.42 

0.46 

0.33 

d 

0.59  1.00 

0.66 

0 

0.32  0.35 

0.16 

0.20 

0.52 

0.34 

0.63 

0.24 

0.46 

e 

0.72  0.75 

0.66 

0.32 

0  0.18 

0.17 

0.19 

0.25 

0.34 

0.42 

0.43 

0.34 

f 

0.58  0.65 

0.48 

0.35 

0.18  0 

0.27 

0.15 

0.19 

0.19 

0.28 

0.35 

0.16 

g 

0.68  0.91 

0.69 

0.16 

0.17  0.27 

0 

0.16 

0.40 

0.35 

0.55 

0.35 

0.42 

h  ‘ 

0.55  0.80 

0.53 

0.20 

0.19  0.15 

0.16 

0 

0.33 

0.19 

0.43 

0.24 

0.27 

i 

0.73  0.51 

0.57 

0.52 

0.25  0.19 

0.40 

0.33 

0 

0.36 

0.21 

0.54 

0.25 

j 

0.39  0.72 

0.34 

0.34 

0.34  0.19 

0.35 

0.19 

0.36 

0 

0.34 

0.22 

0.15 

k 

0.64  0.38 

0.42 

0.63 

0.42  0.28 

0.55 

0.43 

0.21 

0.34 

0 

0.56 

0.19 

1 

0.35  0.94 

0.46 

0.24 

0.43  0.35 

0.35 

0.24 

0.54 

0.22 

0.56 

0 

0.37 

m 

0.49  0.57 

0.33 

0.46 

0.34  0.16 

0.42 

0.27 

0.25 

0.15 

0.19 

0.37 

0 

TABLE  3.4 


Classification  results  using 
nearest-neighbor  decision  rule 


Average  time  for 
one  string  (seel 

Percentage  of 
correct  classification 

91.0  % 

0.07 

201  records  are  correctly 

classified  out  of  221 

94 


:*,'v 


TABLE  3.5 

The  number  of  nonterminals,  productions  and  negative  samples 
accepted  by  the  inferred  grammars.  The  inference  algorithm 
is  k-tail  algorithm  with  different  values  of  k. 


Explosion 

Earthquake 

Nonterm. 

No. 

Product. 

No. 

Nonterm. 

No. 

Product. 

No. 

681 

720 

939 

996 

681 

720 

939 

996 

669 

720 

928 

996 

641 

692 

900 

970 

604 

656 

856 

926 

566 

618 

804 

874 

525 

577 

747 

817 

484 

536 

688 

758 

443 

495 

629 

699 

402 

454 

570 

640 

320 

372 

452 

522 

238 

290 

334 

404 

156 

208 

216 

286 

No.  of 
negative 
samples 
accepted 


0 


the  consequence  when  all  the  training  samples  have  equal  length.  Nor¬ 
mally,  the  merged  states  should  distribute  uniformly  between  initial 
and  final  states.  One  final  note  about  Table  3.6  is  that  the  decrease  of 
parsing  time  is  true  for  any  cases,  but  the  decrease  of  correct  percen¬ 
tage  may  not  be  true  for  other  cases  because  the  experimental  results 
of  our  limited  data  set  are  neither  representative  nor  conclusive. 

We  also  try  the  tail-clustering  finite-state  inferene  algorithm.  Since 
there  are  no  two  states  which  have  common  sentences,  therefore  no 
merge  occurs.  The  productions  and  nonterminals  are  the  same  as 
those  of  A: -tail  algorithm  with  k  =  20.  Again,  this  is  due  to  the  charac¬ 
teristics  of  this  specific  data  set,  and  should  not  be  interpreted  against 
the  algorithm  itself.  We  can  modify  the  condition  for  merge  so  that  two 
states  are  merged  when  the  distance  between  some  of  their  member 
sentences  is  less  than  a  threshold.  This  will  guarantee  a  reduction  of 
grammar  size,  but  again  the  recognition  results  may  be  unpredictable. 

3.6  An  Application  of  Syntactic  Seismic 
Recognition  to  Damage  Assesment 

Damage  assesment  of  a  structure  after  strong  earthquake  is  a  very 
complex  problem  (Yao,  1979).  It  is  usually  performed  by  a  structural 
engineering  expert  who  makes  his  or  her  judgement  by  personal  experi¬ 
ence  and  professional  knowledge.  The  key  informations  include  charac¬ 
teristics  of  the  structure,  observable  damages,  seismic  (vibration) 
recordings  and  nondestructive  testing  results.  Ishizuka  et  al.  (1981) 
have  proposed  a  rule-based  damage  assesment  system  which  employs 
the  luzzy  set  theory  and  the  production  system  with  certainty  factor  to 
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infer  the  damage  state.  Its  performance  relies  on  proper  assignment  of 
membership  function  and  design  of  inference  rules.  The  pattern  recog¬ 
nition  techniques  can  also  be  applied  to  damage  assesment,  which  is 
based  on  the  analysis  of  seismic  recordings.  Its  advantages  are  easy  to 
implement  and  contains  no  uncertainty  factor. 

Seismic  recordings,  i.e.,  acceleration  and/or  displacement  record¬ 
ings,  are  the  only  records  which  show  the  detailed  response  of  the 
structure  during  a  strong  earthquake.  It  is  quantitative,  complete  and 
objective.  Therefore,  if  we  want  to  apply  pattern  recognition  techniques 
to  damage  assesment,  the  seismic  recordings  are  very  good  candidates. 
A  structure  without  damage  will  behave  stiffer  than  the  one  with  dam¬ 
age.  Therefore  from  the  seismic  recording,  preferably  displacement 
recording  for  the  reason  of  no  high  frequency  noise,  we  can  tell  the 
relative  degree  of  damage. 

Since  each  building  is  different  in  structure,  we  have  to  make 
assesment  individually.  One  possible  solution  is  to  compare  the  top 
level  displacement  with  the  basement  displacement.  The  basement  dis¬ 
placement  represents  the  ground  motion,  i.e.,  the  input  to  the  building. 
The  deformation  distance  between  these  two  waveforms  will  be  small  if 
the  building  is  damaged;  otherwise,  the  deformation  distance  will  be 
large.  Unfortunately  good  training  samples  are  unavailable  so  far.  The 
real  recordings  are  not  only  insufficient  but  also  unclassified.  However, 
there  are  a  few  experimental  data  from  the  laboratory  which  can  be 
used  as  a  starting  point. 

Figure  3.10  shows  the  top  level  displacement  and  basement 
acceleration  (at  the  bottom)  during  a  simulated  earthquake  test  on  the 
model  of  a  ten-story  reinforced  concrete  building.  There  are  totally 


seven  test  runs.  It  is  obvious  from  Figure  3.10  that  the  accleration 
waveform  is  much  more  complicated  than  the  displacement  waveform. 
Since  they  are  convertible,  we  chose  displacement  seismogram  for 
comparison. 

Since  only  the  basement  acclerations  are  available,  we  have  to 
compute  displacements  using  numerical  integration.  The  basement 
displacements  of  the  seven  runs  are  all  the  same  as  shown  in  Figure 
3.11,  only  the  magnitudes  are  intensified  from  run  to  run  so  as  to 
assure  more  damage  after  more  runs.  The  top  level  displacements  are 
shown  in  Figure  3.12.  It  is  not  difTucult  to  see  that  the  top  level  dis¬ 
placement  of  run  seven  is  more  similar  in  figuration  to  basement  dis¬ 
placement  than  the  top  level  displacement  of  run  one  is.  This  shows 
that  the  building  structure  becomes  softer  due  to  the  cracks,  breaks 
and  other  implicit  damages.  Some  potential  damages  may  not  be  seen 
from  the  appearence  of  the  building,  but  they  will  be  shown  on  the 
seismic  recording  since  it  reflects  the  actual  structure  response.  This 
is  one  of  the  reasons  why  the  analysis  of  seismic  recording  is  important. 
The  other  reason  is  that  we  can  compute  the  similarity,  or  deformation 
distance  on  the  other  hand,  between  the  waveforms  which  can  be 
further  used  in  a  knowledge-based  damage  assesment  system. 

Computation  of  the  deformation  distance  between  the  seismic 
waveforms  are  based  on  the  modified  dynamic  time  warping  distance  in 
Section  2.2.  Comparing  Figure  2.6  with  Figure  3.12  we  will  find  that  the 
waveforms  in  Figure  2.6  are  actually  taken  from  those  in  Figure  3.12. 
The  slope  constraints  and  local  distance  functions  arc  shown  in  Figure 
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The  selection  of  string  representation  and  the  selection  of  compu¬ 
tational  algorithm  for  string  deformation  distance  are  correlated.  We 
observed  from  the  waveforms  in  Figure  3.  IS  that  several  local  peaks  are 
deformed  and  merge  into  a  large  peak.  Therefore,  we  consider  each 
peak  as  a  component,  i.e.,  primitive  or  symbol,  of  string  representation. 
The  next  problem  is  how  to  describe  each  peak.  Of  course,  shape  and 
geometric  properties  can  describe  a  peak,  they  are  far  complicated 
than  what  is  needed.  Besides,  it  is  difficult  to  implement  these  features 
in  distance  computation.  The  area  of  each  peak  contains  the  informa¬ 
tions  about  the  duration  and  amplitude  of  the  peak.  Since  different 
combinations  of  duration  and  amplitude  may  have  same  area,  area 
alone  is  ambiguous.  But  we  don’t  need  to  worry  about  this  problem 
since  we  are  dealing  with  recordings  from  the  same  structure,  such 
randomly  contrast  shapes  will  not  occur.  We  developed  a  special  string 
deformation  distance  computation  for  this  application,  which  is  a 
modified  dynamic  time  warping  distance  as  shown  in  Section  2.2.1.  The 
type  of  this  deformation  distance  is  ordinal,  i.e.,  rank  orders  have 
meaning,  and  interval,  i.e.,  seperation  between  numbers  is  meaningful. 
However,  the  lower  and  upper  bounds  of  this  distance  is  open,  i.e.,  the 
distance  is  in  the  interval  (0,  M )  where  M  is  the  summation  of  the  total 
area  of  the  two  strings.  For  example,  if  z  =  alaz...am,  and  y  =  b  ,62...6n 
then 


M  = 


m 

2  °i 


2  b,. 

3  =  1 


Each  seismic  waveform  x  is  converted  into  a  string  of  real 
numbers,  x  =  a1a2...an,  a*  >  0,  such  that  the  ith  component  of  the 
string,  Oi,  represents  the  area  of  the  ith  peak.  The  definition  of  the 


peak  here  is  the  segment  between  two  adjacent  zero-crossing  points. 
Therefore  one  peak  may  contain  many  local  maxima  and  minima.  It 
often  happens  that  small  ripples  and  zero-crossings  may  exist  due  to 
the  noise.  These  noisy  ripples  can  be  removed  by  setting  a  threshold  T. 
Only  those  peaks  whose  areas  are  larger  than  threshold  T  are  con¬ 
sidered  as  effective  components.  The  waveforms  are  scanned  from  both 
side  until  a  peak  larger  than  T  is  reached  on  each  direction.  The  left¬ 
most  peak  larger  than  T  will  be  the  first  component  of  the  string  and 
the  rightmost  peak  larger  than  T  will  be  the  last  component  of  the 
string.  This  process  will  eliminate  the  noisy  ripple  before  and  after  the 
signal.  The  noisy  ripples  within  the  signal  are  combined  with  the 
nearest  peak  which  is  greater  than  T.  Therefore,  only  the  significat 
peaks  are  converted  into  components  of  the  string.  The  algorithm  for 
computing  string  deformation  distance  is  similar  to  that  of  Sakoe  and 
Chiba's,  only  the  slope  constraints  and  local  distance  functions  are 
different. 

The  deformation  distance  between  the  basement  displacement  and 
the  top  level  displacement  of  each  run  is  plotted  in  Figure  3.14.  Since 
each  run  of  the  test  adds  some  damage  to  the  structure,  the  degree  of 
damage  is  proportional  to  the  number  of  tests.  Greater  damage  makes 
the  structure  softer,  consequently  the  deformation  distance  between 
the  basement  waveform  and  top  level  waveform  is  smaller.  It  can  be 
seen  from  Figure  3.14  that  the  deformation  distance  is  getting  smaller 
after  more  runs  of  tests.  Figure  3.14  also  shows  that  large  damage 
occurs  during  the  first  three  runs  since  the  differences  of  the  deforma¬ 
tion  distance,  i.e.,  the  slope,  are  larger  than  those  of  the  later  runs. 


In  order  to  normalize  the  length  o!  the  strings  come  from  different 
event,  the  deformation  distance  in  Figure  3.14  can  be  divided  by  the 
length  of  the  basement  waveform  so  that  the  deformation  distance  of 
different  event  can  be  compared.  The  domain  of  damage  can  be  divided 
into  several  intervals,  for  example,  negligible,  slight,  moderate,  spvere, 
etc.  The  deformation  distance  is  used  for  classification  of  damage 
degree.  The  classification  depends  on  which  category  the  deformation 
distance  of  one  event  falls  into.  Other  informations  such  as  human 
observations  and  system  identification  results  are  usuful  auxilary  infor¬ 
mations,  for  example,  to  resolve  the  conflict  when  the  distance  falls  at 
the  boundary.  But  system  identification  is  a  very  complicated  matter, 
it  is  mainly  for  the  study  of  system  characteristics.  Visual  informations 
are  easy  to  obtain  and  are  helpful  in  resolving  conflict  and  ambiguity. 

The  proposed  system  does  not  have  the  opportunity  to  test  real 
data  because  of  the  lack  of  data.  The  research  in  damage  assesment  is 
only  in  its  infancy.  No  organization  or  individual  has  been  working  on 
the  collection  and  classification  of  the  real  data.  We  must  understand 
that  appropriate  samples  for  damage  assesment  are  rather  difficult  to 
obtain.  The  structure  must  be  equipped  with  recording  devices,  subject 
to  strong  earthquake  and  bear  certain  degree  of  damage.  Therefore, 
the  demonstration  of  the  proposed  method  is  based  on  experimental 
data  only.  It  attempts  to  show  the  feasibility  instead  of  practicability  of 
the  proposed  method. 

The  segmentation  of  waveform  employs  some  structural  (contex¬ 
tual)  information.  Peak  extraction  needs  structural  information, 
merge  of  small  peaks  with  the  nearest  large  peak  also  needs  structural 
iniormation.  In  our  demonstration,  only  the  top  level  recordings  are 


used  for  comparison.  Intermediate  levels  are  similar  to  top  level  but 
with  smaller  amplitude. 

3.7  Conclusion 

In  this  chapter,  syntactic  pattern  recognition  has  been  applied  to 
the  discrimination  of  earthquake  and  nuclear  explosion  based  on 
seismic  waveforms.  The  waveforms  are  segmented  by  a  fixed  length.  A 
clustering  procedure  classifies  these  segments  and  a  symbol  is  assigned 
to  each  cluster.  Finite-state  grammars  are  inferred  from  the  training 
set  using  A: -tail  inference  algorithm.  An  error-correcting  parser  and  a 
nearest-neighbor  rule  are  compared  with  respect  to  their  performance 
in  recognition  speed  and  accuracy.  Although  the  classification  results 
seem  to  be  encouraging,  there  is  plenty  of  room  for  improvement.  The 
selection  of  a  set  of  distinguishing  features  is  the  most  important  part 
in  practical  pattern  recognition  applications.  The  difficulty  increases 
when  the  classes  are  somewhat  overlapped.  Most  of  the  features  which 
are  effective  in  decision-theoretic  approach  can  also  be  used  in  the  syn¬ 
tactic  approach  for  primitive  recognition.  The  number  of  features 
selected  should  be  kept  as  small  as  possible  for  the  sake  of  computa¬ 
tional  efficiency. 

In  string  distance  computation,  the  assignment  of  weights  for 
transformation  errors  is  a  difficult  subject  especially  when  insertion, 
deletion  and  substitution  are  all  included.  The  seperation  between  clus¬ 
ters  can  be  used  as  the  substitution  weights  between  corresponding 
primitives  as  we  did  in  our  experiment.  The  distance  from  a  cluster 
center  to  the  origin  can  be  used  as  the  insertion  and/or  deletion  weight 


of  that  primitive.  Heuristic  information  may  be  necessary  and  helpful 
in  most  cases. 

Syntactic  approach  can  be  modified  to  deal  with  stochastic  models 
if  the  probabilities  associated  with  pattern  classes  and  training  samples 
can  be  easily  determined.  In  this  case,  there  will  be  stochastic  gram¬ 
mar,  stochastic  language  and  maximum-likelihood  parsing  (see  Fu, 
1982).  We  did  not  apply  the  stochastic  approach  because  the  class  and 
string  probabilities  are  unavailable.  This  must  be  done  from  the 
analysis  of  the  previous  records.  If  the  probabilities  can  be  determined 
precisely,  which  can  be  made  to  a  certain  degree,  the  class-overlap 
problem  can  be  solved.  Syntactic  approach  can  be  made  more  flexible 
by  adding  numerical  information  (attribute)  to  the  primitives. 
Meanwhile,  it  can  also  make  the  pattern  grammar  less  complex.  We  will 
discuss  an  attributed  seismic  grammar  and  its  parsing  in  Chapter  IV. 

At  the  present  stage,  our  experiments  show  that  the  nearest- 
neighbor  decision  rule  is  faster  than  the  error-correcting  parsing. 
Although  the  speed  of  error-correcting  parsing  depends  on  the  struc¬ 
ture  of  the  grammar,  the  nearest-neighbor  rule  is  faster  in  general. 
VLSI  architectures  have  been  recently  applied  to  both  string  matching 
and  recognition  (by  parsing),  which  will  be  discussed  in  Chapter  V. 
Decision  between  simple,  faster  classification  and  sophisticated,  slower 
syntax  analysis  should  be  made  according  to  application  requirements. 

Syntactic  pattern  recognition  has  also  been  applied  to  damage 
assesment  where  the  seismic  recordings  are  the  physical  measure¬ 
ments.  Strings  of  various  length  are  constructed  from  the  seismic 
waveforms.  A  modified  dynamic  time  warping  is  developed  for  comput¬ 
ing  the  string  distance.  The  segmentation  of  waveform  in  syntactic 


pattern  recognition  usually  uses  shape  information.  The  shape  informa¬ 
tion  appears  to  be  not  important  for  seismic  signal.  Besides,  it  does  not 
have  much  discrimination  capability.  The  envelops  of  the  signal 
appears  to  be  very  good  features  in  some  cases,  for  example,  consider¬ 
ing  Figure  1.2,  but  not  so  in  other  cases,  for  example,  when  Figure  1.3 
and  1.4  are  compared  with  Figure  1.2.  The  application  to  damage  asses- 
ment  shows  that  special  algorithm  for  string  distance  computation 
must  be  developed  for  some  applications  when  the  general  string  dis¬ 
tances  seem  unable  to  solve  the  problem. 
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CHAPTER  IV 

INFERENCE  AND  PARSING  OF  ATTRIBUTED  GRAMMAR 
FOR  SEISMIC  SIGNAL  RECOGNITION 

4.1  Introduction 

Attributed  grammars  were  first  formulated  by  Knuth  (1968)  where 
"meaning''  can  be  assigned  to  a  string  in  a  context-:.‘ree  language  by 
defining  "attributes"  of  the  symbols  in  a  derivation  tree  for  that  string. 
The  attributes  are  defined  by  functions  associated  with  each  production 
in  the  grammar.  Although  the  idea  of  attributed  grammar  is  due  to 
Irons  (see  Knuth,  1968),  Knuth  included  inherited  attributes  as  well  as 
synthesized  attributes  which  often  leads  to  significant  simplification. 
While  attributed  grammars  were  originally  proposed  or  programming 
languages,  they  have  been  applied  to  pattern  recognition  recently  and 
increasingly.  Tang  and  Huang  (1979)  used  attributed  grammars  for 
image  understanding.  You  and  Fu  (1978,  1979),  Tsai  and  Fu  (1980)  and 
Tai  and  Fu  (1981)  have  applied  attributed  grammars  to  shape  recogni¬ 
tion  and  transformation.  Shi  and  Fu  (1982)  proposed  an  efficient 
error-correcting  parser  for  attributed  tree  grammars  where  semantic 
information  are  associated  with  each  terminal  but  no  semantic  rule  is 
associated  with  the  production.  Leung  (1902)  also  proposed  an  error- 
correcting  parser  for  attributed  grammars  with  applications  to  charac¬ 
ter  recognition.  Knuth’s  formal  semantics  can  also  be  applied  to 
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patterns  described  by  picture  description  language  (PDL)  expressions 
(Fu,  1982). 

The  advantages  of  using  attributed  grammars  for  pattern  recogni¬ 
tion  are  twofold.  The  inclusion  of  semantic  information  increases  the 
flexibility  in  pattern  description;  in  the  meantime,  it  reduces  the  syn¬ 
tactic  complexity  of  the  pattern  grammar.  We  may  notice  that  all  the 
above  applications  are  essentially  to  pictorial  shape  recognition  where 
length  and  angle  are  useful  semantic  informations.  This  same  set  of 
attributes  can  also  be  used  in  waveform  shape  recognition,  e.g.,  ECG 
analysis,  where  shape  information  is  very  important  in  recognition. 
However,  they  can  not  be  applied  to  the  signals,  e.g.,  EEG,  seismic  and 
speech,  where  shape  informations  are  not  particularly  important.  The 
segmentation  of  these  signals  usually  corresponds  to  a  short,  fixed-  or 
variable-length  time  period.  In  order  not  to  overlook  any  transition, 
the  time  periods  are  usually  kept  relatively  short.  Therefore,  it  is  very 
common  that  the  same  primitive  may  last  for  several  periods.  This 
often  makes  the  pattern  strings  and  the  inferred  grammars  unneces¬ 
sarily  complicated.  The  numbers  of  productions  and  nonterminal  sym¬ 
bols  are  usually  very  large  as  we  can  see  from  the  experimental  results 
in  Section  3.5.  Instead  of  keeping  track  of  all  these  identical  primitives, 
we  can  use  one  syntactic  symbol  to  represent  the  type  of  the  primitive 
with  an  attribute  to  indicate  the  length  of  the  primitive.  This  leads  to 
the  application  of  length  attribute  to  seismic  and  other  similar  digital 
signal  analysis. 

A  pattern  primitive  a  can  be  represented  by  a  2-tuple 
a  =  (s ,  x) 


2 


where  s  is  a  syntactic  symbol  denoting  the  primitive  structure  of  a, 

and  x  =  (sj.12 . im),  m  ^  0,  is  an  m-dimensional  semantic  vector  with 

each  xit  i  =  1,  2,  ....  m,  denoting  a  numerical  measurement.  A  pattern 
string  can  be  represented  by  a  ia2a3. where  a*  =  ( sit  Zj),  is  the 
length  of  primitive  a*,  For  a  fixed-length  segmentation,  £*  =  c 

for  all  i,  where  c  is  a  constant.  For  a  variable-length  segmentation,  i* 
may  or  may  not  equal  to  lj  when  i  £  j.  In  our  case,  =  c  for  l<i^20, 
where  c  =  60  points.  For  simiplicity,  with  constant  length  in  mind,  we 
can  eliminate  the  semantic  part.  For  example,  a  pattern  string  may 
look  like 

aaadgggegggggggege eg 

where  these  are  syntactic  symbols.  It  can  be  further  simplified  by 
merging  identical  symbols,  therefore  the  above  string  becomes 

a dgegegeg 

313171121 

where  the  numbers  are  numbers  of  unit  lengths;  each  unit  length  con¬ 
tains  60  points  in  our  case.  This  idea  shows  some  storage  improvement 
in  string  representation,  and  it  will  show  significant  improvement  in 
grammatical  inference  as  we  will  see  in  the  next  section.  Although  we 
used  finite-state  grammars  to  describe  the  seismic  patterns  in  Chapter 
III,  we  will  use  attributed  cfg’s  here.  This  is  because  attributed  fsg’s  do 
not  have  much  reduction  in  the  number  of  productions  and  nontermi¬ 
nals.  Only  attributed  cfg's  can  drastically  reduce  the  production 
number,  therefore  make  the  recognition  more  efficient.  An  error- 
correcting  parser  for  attributed  context-free  grammar  is  given  in  Sec¬ 
tion  4.3.  Stochastic  attributed  grammar  and  parsing  will  be  discussed 
in  Section  4.4. 
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4.2  Inference  of  Attributed  Grammar 
for  Seismic  Signal  Recognition 

An  attributed  context-free  grammar  is  a  4-tuple  G  =  (V^,  Vj-,  P,  S) 
where  each  production  rule  contains  two  parts,  one  is  a  syntactic  rule, 
the  other  is  a  semantic  rule  (Knuth,  1968).  Each  symbol  X  e  (V)y  (j  Vp) 
is  associated  with  a  finite  set  of  attributes  ^4 (A') ;  and  A(JT)  is  partitioned 
into  two  disjoint  sets,  the  synthesized  attribute  set  Ao(X)  and  the  inher¬ 
ited  attributed  set  A^X).  The  syntactic  rule  has  the  following  form 

Xk0  -*  XklXle2....Xknit 

where  k  means  the  fcth  production.  The  sematic  rule  maps  values  of 

certain  attributes  of  Xk0.  *ki . into  the  value  of  some  attribute  of 

Xkj.  The  evaluation  of  synthesized  attributes  is  based  on  the  attributes 
of  the  descendents  of  the  nonterminal  symbol,  therefore  it  is  a 
bottom-up  fashion  in  the  tree  structure.  On  the  contrary,  the  evalua¬ 
tion  of  inherited  attributed  is  based  on  the  attributes  of  the  ancestors, 
therefore  it  is  a  top-down  fashion  in  the  tree  structure. 

In  Chapter  III,  we  have  chosen  a  set  of  41  explosion  seismic  records 
as  training  samples.  Each  record  has  been  converted  nto  a  string  of  20 
primitives.  If  we  use  the  k-tail  algorithm  to  infer  a  finite-state  gram¬ 
mar  for  the  pattern  class  with  a  value  of  k  -  20,  the  total  number  of 
productions  will  be  720  and  the  number  of  nonterminal  symbols  will  be 
681.  In  order  to  reduce  the  size  of  the  grammar  we  use  one  length 
attribute,  i.e.,  the  number  of  unit  lengths.  The  input  strings  are  attri¬ 
buted  strings,  and  the  production  rule  of  the  gramme.r  has  a  syntactic 
part  as  well  as  a  semantic  part  which  contains  both  synthesized  and 
inherited  attributes.  The  type  of  grammar  is  also  upgraded  into  a 


context-free  grammar,  due  to  the  type  of  S -productions.  Tai  and  Fu 
(1982)  used  the  length  attribute  of  the  strings  in  the  inference  of  a 
class  of  context-free  programmed  grammar  (cfpg).  However,  the 
length  attribute  is  only  for  the  construction  of  the  con.rol  diagram,  i.e., 
a  graphical  representation  of  the  success  and  failure  go-to  fields.  The 
inferred  cfpg  is  non  attributed,  and  the  parsing  was  not  discussed.  We 
use  length  attribute  in  both  inference  and  parsing.  The  inferred  gram¬ 
mars  are  attributed  grammars,  and  the  attribute  plays  an  important 
role  in  parsing. 

To  explain  our  inference  procedure,  let  us  first  consider  one  input 
string 

aaadgggegggggggege eg 

where  each  primitive  has  a  length  attribute  1  which  means  1  unit 
length.  First,  it  will  be  converted  into  the  following  string  by  merging 
identical  primitives. 
a dgegegeg 
313171121 

Theoretically,  the  length  attribute  is  continuous.  But  in  digital  signal 
processing,  the  waveforms  represenesented  by  a  finite  number  of  sam¬ 
pled  points,  therefore,  the  length  is  always  discrete  iri  practical  cases. 
In  our  case,  the  length  attribute  is  the  number  of  unit  lengths.  It  is 
discrete  and  is  a  positive  integer.  Then  we  can  infer  the  following  attri¬ 
buted  grammar 

Syntactic  rules  Semantic  rules 


(1)  S  -*ADGEGEGEG  L{A  l)=3,L(D)=l,L(Gl)  =  3, 

L{E\)=\,L{GZ)=7MEZ)-l, 


L(G3)=l,L(E3)=2,L(G4)=i 

(2) 

A  -*aA 

L(Al)-l(a)+l(A2) 

(3) 

A  -*a 

l(A)=L(a) 

(4) 

D  -*dD 

l(Dl)=L(d)+l(D2) 

(5) 

D-*d 

f(Z?)  =  i(d) 

(6) 

E  ~*eE 

L(El)=l(e)+l(E2) 

(?) 

E  -»e 

L(E)=L(e) 

(8) 

G-*gG 

l(Gl)=l(g)+l(G2) 

(9) 

G~*g 

L(G)=L(g) 

where  L  denotes  inherited  length  attribute,  i  denotes  synthesized 
length  attribute  and  the  number  right  after  the  nonterminal  symbol  is 
used  to  distinguish  between  occurrences  of  like  nonterminals.  It  is 
noted  that  the  inherited  attributed  L  does  not  pass  down  to  the  descen¬ 
dants  as  it  usually  does;  rather  it  is  used  to  maintain  the  semantic 
information  of  the  training  string  and  as  a  reference  for  comparison  in 
parsing.  For  simplicity  we  let  f( a)  =  1  for  all  a  e  Vy.  When  we  have 
another  input  string 

aac  de  hi  hfff/ffhm  ffff 
we  convert  it  into 

acdehihfhmf 
2  1111116  114 

and  add  to  the  grammar  the  following  productions 
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Syntactic  rules 


Semantic  rules 


H-^hH 
H  -*h 
I  -*il 


S  -*  ACDEHIHFHMF  L(A)~2,L{C)=\,L(D)  =  1, 

L(E)—l,L(H  1)=  1, .£,(/)=  1, 
L(H2)=l,L(Fl)=6,L(H3)  =  l, 

L(M)  =  1,L(F2)=4 
C-*cC  l(Cl)~l(c)+l(C2) 

C-*c  L(C)=L(c) 

H-*hH  l(Hl)=L(h)+l(H2) 

H  -*h  L(H)=L(h) 

I  -*il  f(/l)=J(i)+f(/2) 

I  -*i  l(I)=L(i ) 

F-*/F  L(Fl)=l(f)+L(F2) 

nn=L(/) 

We  may  notice  that  after  reading  a  few  input  strings  there  will  be  no 
need  to  add  those  C-*cC,  C-*c  productions.  We  only  need  to  add  one 
production  for  each  input  string,  i.e.,  the  first  production  in  the  above 
example.  In  fact,  there  are  2m  +n  productions  for  a  set  of  n  training 
strings,  where  m  is  the  number  of  nonterminal  symbols.  We  now  for¬ 
mulate  the  inference  algorithm  of  attributed  grammars  which  use 
length  attribute. 

Algorithm  4.1  Inference  of  Attributed  Seismic  Grammar 
Using  A  Length  Attribute 

Input:  A  set  of  training  strings  where  each  string  has  a 
syntactic  symbol  and  a  length  attribute. 

Output:  An  Attributed  Grammar. 

Method: 


(1)  For  each  input  string,  merge  identical  primitives;  the  length  is 
the  summation  of  the  individual  lengths. 

(2)  For  each  input  string  a1a2a3...afc,  add  to  the  grammar  the  pro¬ 

duction  S -*AiAzA$...Ak  where  A*  is  the  nonterminal  corresponding  to 
terminal  a*;  and  the  semantic  rule  L(At)  =  1+,  ,  where  I*  is  the 

length  attribute  of  primitive  a*. 

(3)  For  each  primitive  a,  add  to  the  grammar  the  production 
A-*aA,  l(Aj)  =  1(a)  +  L(AZ)  and  A-*a,  1(A)  =  1(a),  if  they  are  not  already 
existed. 

(4)  The  set  of  terminals  includes  all  the  different  primitives;  the  set 
of  nonterminal  includes  all  the  nonterminal  symbols  in  Step  (2). 

A  flow  chart  of  this  inference  algorithm  is  given  in  Figure  4.1.  This 
inferred  grammar  will  generate  excessive  strings  if  we  apply  syntactic 
rules  only.  However,  we  can  use  semantic  rules  (inherited  attributes) 
to  restrict  the  grammar  so  that  no  excessive  strings  are  generated. 

The  inferred  grammar  from  the  41  training  strings  is  shown  in  the 
following. 


Syntactic  rules  Semantic  rules 


( 1) 

S  -+ACA  GHF 1JMJFMKMJM 

(1,1, 1,1,1, 1,1, 2, 1,1, 1,2, 1,1, 3,1) 

(2) 

S  -  MKL  GIFD1FHFMK1LIB 

(1,1, 1,1, 1,1, 1,1, 1,1, 2, 1,1, 1,2, 1,2) 

(  3) 

S  ->  LEIFJLFBFHDJFKJL 

(3, 2, 1,1, 1,1, 1,1, 1,1, 1,2, 1,1, 1,1) 

(4) 

S  ->  LJ  LEFKJHFJMJM  IF J 

(1,1, 1,1, 1,2, 1,1, 3, 1,1, 1,1, 1,1, 2) 

(  5) 

S  -*  ULGF HFHFH1FMJ  FLFM 

(1,1, 1,1, 1,1, 1,2, 1,1, 1,2, 1.1, 1,1, 1,1) 

(  6) 

S  -*  ACDEHIHFHMF 

(2, 1,1, 1,1, 1,1, 6, 1,1, 4) 

(7) 

S  -*  ALGIMLMKJ MLMJLJL 

(1,2, 1,2, 1,1, 1,1, 1,1, 3, 1,1, 1,1,1) 

(8) 

S  -» LMLGEMKJKMKJKMJM 

(1,1, 1,1, 1,1,2, 1,1, 1,1, 1,1, 1,3, 2) 

(9)  S  -  CKDIFKJMKMJMJM 

(10)  5  -  DLDHJ MLMJFLMKL 

( 1 1 )  5- CACEIFKMKJ  MKM 

(12)  S-+  LMGFKFIFMJM 

(13)  S  -» AB  CGI MKMKBKJM 

(14)  S-*  CEHIJFMFKJ  MFMFJ 

(15)  S  -  KMKEFMFIJKMJKJKM 

(16)  S  -*  LJEHDFLJMFJUFJ 

(17)  S  -  JMGFHMFHFMHLJIM 

(18)  S  -*  BJEFKMKMKMKMKMK 

(19)  5  -  BCBGHEFHFJF 

(20)  S'-*  IKEIHIFIHFIHFLF 

(21)  S  ->  DFHFDFLIF 

(22)  S^ACEHFJMKFJMKM 

(23)  S^JLGHDHLMJL 

(24)  S'  ->  KMBEHFMKBKM 

(25)  5  -» KBKGHMFMFKHMJ 

(26)  S  -  LMIEIHFHJ IKMLKLK 

(27)  S-*ADGEGEGEG 

(28)  S->MACGHFJMFJMJM 

( 29 )  S  -»  JMGEFKJMKJKMJ 

(30)  S  -  LDGEDHDLDLDLD 

(31)  S^IHFEIEHIHIFIFIDI 

(32)  S- HIEIEIFHDHDEBFHF 

(33)  S  -»  GDEGEIEGIGED  GDE 
(3-1)  S  -» KBHDGHDHGDGDGHD 
( 35 )  S  -  CV1 GEFIFKFM  FMJM 


(1.2. 1.1.4. 1.1. 1.1. 1.2.2. 1.1) 

(2.1. 1.2. 1.3.1. 1.1. 1.1. 1.2. 2) 

(1.1. 1.1. 1.1. 1.1. 3. 2. 5. 1.1) 

(2.1. 1.1.1. 1.2. 3. 3. 4.1) 

(1.1. 1.1. 1.4. 1.2.2. 2. 1.1. 2) 

(3. 1.1. 1.1. 1.2. 2. 1.1. 2. 1.1. 1.1) 

(1.1. 1.1. 2. 1.1. 1.1. 1.1. 1.3. 1.1. 2) 

(2.1. 2. 1.1. 1.2. 2. 1.1. 1.1. 1.2.1) 

(1,2, 2, 2, 1,1, 1,2,1, 2, 1,1, 1,1,1) 

(2. 1.2. 1.1. 1.2.1. 1.3. 1.1. 1.1.1) 

(1.1. 1.2. 3. 1.1. 7. 1.1.1) 

(1.2.1. 1.1. 1.2. 1.1. 4. 1.1. 1.1.1) 

(6. 2. 1.1. 1.2. 3.2. 2) 

(1.2. 1.2. 3. 1.3. 1.1. 1.2. 1.1) 

(1.2. 1.1.1. 1.7. 1.4.1) 

(1.1. 1.2. 1.1. 1.4. 2. 5.1) 

(1.1. 1.1.1. 1.2. 2.2. 1.1. 5.1) 

(1. 2. 1. 1.1. 1.1. 1.1. 1.1.1. 3.1.2. 1) 

(3. 1.3.1. 7.1. 1.2.1) 

(1.1. 1.1. 1.1. 1.3. 1.1.2. 2. 4) 

(2.1. 1.1. 1.2.2. 5. 1.1. 1.1.1) 

(1.2. 1.1. 1.1. 2. 1.1. 1.1.2. 5) 

(1.1. 1.2. 1.1. 1.1. 1.2. 1.2.1. 2.1.1) 

(1,2, 2,1, 1,2, 1,1, 1,1, 1,1, 1,2, 1,1) 

(1.1. 1.1. 2. 1.5. 1.1. 1.1. 1.1. 1.1) 

(2. 1.1. 1.1. 1.1. 1.4. 1.1. 2. 1.1.1) 

(1.1. 1.1. 1.3.1. 2. 1.1. 3. 1.1. 1.1) 
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(36) 

S  -» LJLGIFLFMJFLFMJMJF 

(1.1, 1,1, 1,1, 1,1, 1.1, 1,3, 1,1, 1,1, 1.1) 

(37) 

S  -*  DFDHDHDLDF 

(4, 1,2,1, 2, 4.1, 1,3,1) 

(38) 

S  ->  HJLFEFGEGFIFEH 

(1.1. 1,1, 1.1. 1,5, 2, 1,1, 1,2,1) 

(39) 

S  -*  F J ILF H  GEIEHEGD 

(1,1, 1,1, 2, 1,2, 2, 1,1, 2, 1,2, 2) 

(40) 

S  -*  B1HEGDGHGH  G 

(3, 1,3, 2, 3, 1,1, 2.1. 1,2) 

(41) 

S  -  CKCFHDGHGLHDH 

(1,1, 1,2, 1.1, 1,1, 6, 1,2, 1,1) 

(42) 

A-*aA 

l(Al)=l(a)  +  l(A2) 

(43) 

A-*a 

L(A)=L(a) 

(44) 

B-*bB 

L(Bl)=L(b)+l(B2) 

(45) 

B  -*b 

L(B)=L(b) 

(46) 

C-*cC 

i(Cl)=i(c)+£(C2l 

(47) 

C-*c 

t(C)—l(c ) 

(48) 

D-*dD 

L(D\)-L{d)+l{D2) 

(49) 

D-*d 

L(D)=L(d ) 

(50) 

E-*eE 

l(Ei)=l(e)+l(E2) 

(51) 

E-*e 

l(E)=l(e) 

(52) 

F-*JF 

l(Fl)=t(f)+l(F2) 

(53) 

F~*f 

-sj 

II 

/ - - 

(54) 

G-*gG 

l(Gl)=l(g)+l(G2) 

(55) 

G-*g 

1(G)— 1(g) 

(56) 

H-*hH 

l(Hl)=L(h)+l(H2) 

(57) 

H-*h 

L(B)=L{h) 

(58) 

I  -*il 

£(/l)=£(i)+£(/2) 

(59) 

I  -*i 

l(I)=L(i) 

(60) 

J-*jJ 

L(Jl)=L(j)+L(J  2) 

(61) 

J-*j 

L{J)=L(j) 

(62) 

K-*kK 

l(Kl)=L(k)+l(K2) 
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(63) 

K-+k 

l(K)=l(k ) 

(64) 

L-*LL 

L(Ll)=l{l)+l(L2) 

(65) 

L-l 

l(L)=l(l) 

(66) 

M-*mM 

(67) 

M-*m 

L(M)=L(m) 

where  (1,1, 1,1, 1,1, 1,2,1, 1,1, 2, 1,1, 3,1)  is  a  shorthand  for  the  inherited 
attributes  whose  meaning  should  be  clearly  understood  from  the  previ¬ 
ous  examples. 

This  attributed  grammar  has  67  productions,  a  more  than  90% 
reduction  from  the  nonattributed  grammar  which  requires  720  produc¬ 
tions  for  91%  correct  recognition.  There  are  only  13  nonterminal  sym¬ 
bols  in  this  attributed  grammar,  which  is  equal  to  the  number  of  termi¬ 
nal  symbols.  The  nonattributed  grammar  has  681  nonterminals.  The 
number  of  nonterminal  symbols  will  not  increase  in  this  attributed 
grammar  and  the  number  of  productions  will  increase  at  most  by  one 
for  each  additional  input  string.  We  can  also  expand  the  inherited  attri¬ 
bute  into  a  set  of  numbers.  For  example,  we  may  let  L(A)  =  (2,  3,  4j, 
which  means  the  length  of  nonterminal  symbol  A  can  be  2,  3  or  4.  This 
will  greatly  increase  the  flexibility  in  some  applications. 


4.3  Error-Correcting  Parsing  of 
Attributed  Seismic  Grammar 

A  modified  Earley’s  parsing  algorithm  is  used  for  our  attributed 
context-free  seismic  grammars.  We  assume  that  substitution,  insertion 
and  deletion  of  terminal  symbols  are  allowed,  but  no  substitution, 
insertion  or  deletion  of  nonterminal  symbol  is  permitted.  This  means 


the  length  of  the  local  segment  is  variable,  even  local  noise  is  tolerable, 
but  the  whole  local  segment  can  not  be  deleted  entirely.  The  local  seg¬ 
ment  means  a  segment  of  identical  terminal  symbols.  The  item  of  this 
parsing  algorithm  has  the  form  [ A-*a  •  /?,  77,  £,  i]  where  77  is  a  counter 
for  local  syntactic  deformation  which  accumulates  the  total  cost  of  sub¬ 
stitution  of  terminal  symbols.  £  is  used  for  two  different  purposes. 
When  A  S ,  £  is  used  as  synthesized  attribute  of  A.  On  the  other  hand, 
if  A  =  S  then  £  is  used  as  a  counter  for  semantic  deformation  which 
records  the  total  length  variation  of  nonterminal  symbols,  and  i  is  the 
same  pointer  as  a  conventional  Earley's  parser.  A  parsing  algorithm  for 
expanded  attributed  grammar  using  length  attribute  has  been  pro¬ 
posed  by  Leung  (1982).  As  usual,  we  don’t  need  an  expanded  grammar. 
All  the  deformations  are  examined  during  the  parsing  while  errors  are 
recorded  in  appropriate  counters.  The  parsing  algorithm  is  shown  in 
the  following. 

Algorithm  4.2  Minimum-Distance  Error-Correcting  Parsing  Algorithm 
for  Attributed  Context-Free  Seismic  Grammar. 

Input:  An  attributed  seismic  grammar  G  =  ( V/v* V-p.P.S)  and  an 
input  string  y  =  6!62...6m  in  Ff. 

Output:  The  parse  lists  70,  and  decision  whether  y  is 

accepted  by  the  grammar  G  together  with  the  syntacic  and  semantic 
deformation  distances. 

Method: 

(1)  Set  j  -  0.  Add  [5-»  •  a,  0,  0,  0]  to  I3-  if  S ->a  is  a  production  in  P . 

(2)  Repeat  step  (3)  and  (4)  until  no  new  items  can  be  added  to 
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(3)  If  [ A-*a.  •  B\ 3,  T),  £,i]  is  in  Ij, and  B-*y  is  a  production  in  P,  then 
add  item  [ B-*  ■  y,  0,  0,  j]  to  7y. 

(4)  (a)  If  [A+a  ■  ,  rj2,  Sz.i]  is  in  7y  and  [.A-»a  •  A,  T)h  £hk]  is  in  Iit 
then  add  an  item  [ A-+a.A  •  ,  t?i+7?2,  ?i+£2,fc]  to  Ij.  (There  is  no  need  to 
check  collision  here,  since  there  will  be  no  other  item  of  the  form 
[A-*aA  •  ,  r),  f.fc]  in  Ij.) 

(b)  If  [A-*a  ■  ,  772i  £2-*]  A  and  \.S~*P  '  Ay,  r)lt  fj.A:]  is  in  Iit  then 
add  an  item  [S-+/S/1  •  y,  771+7}2,  $\  +  (L(A)—£z),k  ]  to  Ij,  where  L{A)  is  the 
inherited  attributed  of  the  nonterminal  symbol  A. 

(5)  If  j-m,  go  to  step  (7);  otherwise  j-j  +  1. 

(6)  For  each  item  [ A-*  •  a/3,  77,  £,i]  in  7y_j  add  [A-*a  ■  /3,  ri+S(a.,bj), 
£+L(bj),  i]  to  Ij,  where  l(bj)  is  the  synthesized  attribute  of  bj.  For  sim¬ 
plicity,  we  may  let  L(bj)  =  1  for  all  j.  S(a,bj)  is  substitution  cost,  and 
S(a,bj)  =  0  when  a  =  bj.  Go  to  (2). 

(7)  If  item  [5-*a  •  ,  77,  £,  0]  is  in  An.  then  string  y  is  accepted  by 
grammar  G  where  77  is  the  syntactic  deformation  distance  and  £  is  the 
semantic  deformation  distance;  otherwise,  string  y  is  not  accepted  by 
grammar  G.  Exit. 

A  flow  chart  of  this  parsing  algorithm  is  given  in  Figure  4.2.  It  is 
noted  that  (l)  The  parse  extraction  is  straightforward  once  the  first  S- 
production  is  identified,  therefore  we  do  not  include  the  parse  extrac¬ 
tion  algorithm.  This  is  obvious.  Since  we  use  attributes,  the  syntactic 
part  will  be  much  simpler  than  that  of  a  nonattributed  (context-free) 
grammar.  (2)  Deformation  of  any  type  on  terminal  symbols  will  be 
accepted.  For  a  simple  example,  the  string  aaadgggegggggggegeeg' 
will  be  accepted  by  our  seismic  grammar  with  no  error;  the  string 
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j=0.  Add  [S  a, 0,0.0] 


to  Ij  if  S 


production  in  P 


If  [A  —  a  •  DA  i ?,£,  i  ]  is  in  Ij, 
and  B  —  t  is  a  production 
in  P  then  add  [B  —  •  -r,0,0,j] 
to  1. 


If  |A  — »  a  \ij2,£2,il  is  in  Ij  and 
| A  —  is  in  I,, 

then  add  |A  — »  aA',i7,  +  »;2,£|+x2.k) 
to  I; 


If  (A  — •  a*,i?2,(j,i]  is  in  Ij  and 

|S  -  AA'j,'?i,Ci.k]  is  in  I„ 

then  add  [S  -»  /JA-j.ij,  +  ij2,£, + (L(  A)-£2),k| 

to  I; 


items  added 


If  [S  —*,*», £,0]  is  in  Im, 
then  string  y  is  accepted  by 
G,  t)  is  syntactic  and  £  is 
semantic  deformation  distances 


j=j  +  I 


For  each  |A  —  •a/3,»/,£,i] 
in  Ij.,  and 

|A  —  a’Ai?  +  S(a,bj),£+l(bj),i] 


Figure  4.2  A  flow  chart  of  the  parsing  algorithm  (Algorithm  4.2). 
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'aadgggeg...'  will  be  accepted  with  semantic  error  of  one  unit  length  on 
'A';  and  the  string  'abadgggeg...'  will  also  be  accepted  with  a  syntactic 
substitution  error  S(a,b ). 

The  time  complexity  of  Algorithm  4.2  is  0(nz)  where  n  is  the  length 
of  the  input  string,  since  each  item  iist  7y  takes  time  O(j)  to  complete. 
However,  since  we  only  considered  substitution  error  in  the  seismic 
recognition  problem  in  Section  3.5,  a  simplified  version  of  Algorithm 
4.2,  i.e.,  Algorithm  4.4,  can  be  applied.  This  special  parser  is  faster 
than  Algorithm  3.2.  The  experimental  results  are  given  in  Section  4.5. 
The  question  about  how  much  advantage  we  can  take  by  using  attri¬ 
butes  depends  on  the  selection  and  characters  of  the  training  samples. 
If  the  training  samples  are  very  much  alike,  then  there  are  great  possi¬ 
bilities  that  less  syntactic  rules  are  needed;  instead,  attributes  will  be 
used  to  distinguish  between  different  patterns.  An  attributed  grammar 
can  also  be  constructed  manually  based  on  the  knovrledge  about  pat¬ 
tern  sources.  This  may  sometimes  be  a  great  advantage. 


4.4  Stochastic  Attributed  Grammar 
and  Parsing  for  Seismic  Analysis 

Although  we  do  not  know  the  probability  distribution  of  the  training 
samples  at  this  moment,  it  is  possible  to  estimate  it  if  more  samples 
are  available.  If  the  probability  distribution  of  the  training  samples  is 
known,  then  we  can  infer  the  production  probability  using  the  algo¬ 
rithm  described  in  Lee  and  Fu  (1972b).  Therefore,  we  also  include  a 
parsing  algorithm  for  stochastic  attributed  seismic  grammar  in  this 
section.  A  stochastic  version  of  the  attributed  grammar  shown  in 
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Section  4.2  can  be  formulated  as  follows.  First,  a  probability  is  associ¬ 
ated  with  each  production.  Second,  a  probability  distribution  is  associ¬ 
ated  with  all  the  possible  attributes.  For  example,  if  originally  L{A)  - 
[3.  4,  5$,  now  it  may  become  L{A )  =  [(3.  0.25),  (4,  0.5),  (5,  0.25){,  where 
0.25  =  Prob  \L(A)=3].  Finally,  probabilities  instead  of  costs  are  used  to 
characterize  substitution  transformations.  The  probability  associated 
with  each  S’ -production  will  be  the  probability  of  occurrence  of  the 
training  string  which  contributes  to  that  production. 

The  parsing  algorithm  of  stochastic  attributed  seismic  grammar  is 
very  similar  to  Algorithm  4.2  except  for  the  following  changes.  First,  tj 
is  now  the  probability  of  syntactic  substitution  deformation.  Second,  £ 
is  still  used  as  a  synthesized  attribute  of  A  when  A  A  S,  however,  when 
A  =  S’,  £  will  be  the  probability  of  semantic  deformations. 

Algorithm  4.3  Error-Correcting  Parsing  Algorithm  for 
Stochastic  Attributed  Seismic  Grammar 

Input:  An  attributed  seismic  grammar  G  =  ( VN.  VT,P,S)  and  an 
input  string  y  =  6,62...6m  in  Vy. 

Output:  The  parse  lists  70,  /lt...,/m,  and  decision  whether  y  is 
accepted  by  the  grammar  G  together  with  the  syntanc  and  semantic 
deformation  probabilities. 

Method: 

(1)  Set  j  =  0.  Add  [S'-*  •  a,  1,  1,  0]  to  7;-  if  S’ -*a  is  a  production  in  P. 

(2)  Repeat  step  (3)  and  (4)  until  no  new  items  can  be  added  to  Ij. 

(3)  If  [A-*a  ■  Bp,  p,  £,i]  is  in  7;-,and  B  ->y  is  a  production  in  P,  then 
add  item  [B  -»  ■  y,  1 ,  1 ,  j  ]  to  Ij . 
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(4)  (a)  If  [A-*a  ■  ,  r\z,  £2,i]  is  in  and  [A-*a  ■  A,  77  j,  £j,fc]  is  in  It, 
then  add  an  item  [A-*aA  ■  ,T)i  •  t)2,  £i+£2,fc]  to  Ij.  (There  is  no  need  to 
check  collision  here,  since  there  will  be  no  other  i;em  of  the  form 
[A-*aA  ■  ,  r),  £,*]  in  Ij.) 

(b)  If  [4-»a  •  ,  772,  £2.^]  iR  Ij  and  [S -+i 3  ■  Ay,  rjlt  fc]  is  in  Iit  then 
add  an  item  [S-*fiA  ■  y,  Vi  '  Vz<  £i  '  ]  to  Ij,  where  L (4)  is  the 

inherited  attributed  of  the  nonterminal  symbol  A. 

(5)  If  j-m,  go  to  step  (7);  otherwise  ,7=7  +  1. 

(6)  For  each  item  [A -»  •  a/9,  77,  £,i]  in  /y.j  add  [A-*a  •  /9, 
77  •  Ps(bj\a),  f+i(6y),  i]  to  Ij,  where  L(bj)  is  the  synthesized  attribute 
of  bj.  For  simplicity,  we  may  let  l(bj)  =  1  for  all  j.  P.?(bj  I  a)  is  substi¬ 
tution  probability.  Go  to  (2). 

(7)  If  item  [S-m  •  ,  77.  £,  0]  is  in  Im,  then  string  y  is  accepted  by 
grammar  G  where  77  is  the  syntactic  deformation  probability  and  £  is 
the  semantic  deformation  probability;  otherwise,  string  y  is  not 
accepted  by  grammar  G.  Exit. 

A  flow  chart  of  this  parsing  algorithm  is  given  in  Figure  4.3.  Due  to 
the  error-correcting  characteristics  there  may  be  more  than  one  item 
of  the  form  [S-Ma  ■,  77,  £,  0]  in  Im.  In  that  case,  a  decision  should  be 
made  based  on  77  and  £.  Weights  can  be  assigned  to  77  and  £.  Neverthe¬ 
less.  this  is  a  rather  subjective  judgement,  and  is  always  a  problem 
when  using  both  syntactic  and  semantic  informations. 
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j=0.  Add  (S  -  *0,1,1, OJ 
to  Ij  if  S  a  is  a 
production  in  P 


If  [A  -*  o*B£,i?,{,i]  is  in  Ij 
and  B  -*  *r  is  a  production  in  P 
then  add  |B  -»  -j.l.lj]  to  Ij 


If  [A  —  a*,if2,(2,i]  is  in  Ij  and 
|A  -»  a*A,i}|,(,kj  is  in  Iit  then  add 
|A  — -  aA*,if,*i/2,(, +(2,k)  to  Ij 


If  [A  -»  a*,if2,(2,i]  is  in  Ij  and 
|S  —  0*A-r,i»„(l,k]  is  in  1,,  then  add 
|S  —  /#A*'?,ij1*i}2,£i*  Prob{(,},k]  to  Ij 


_ _ 

items  added 


If  |S  —  a*,i?,f,0)  is  in  Ira, 
then  string  y  is  accepted  by  G, 
ij  is  syntactic  and  (  is  semantic 
deformation  distances 


j=m? 


j=j  +  l 


For  each  |A  —  *a/9,i/,(,i]  in  Ij.,  add 
[A-a*^*P.(bj|a),e+l(bj),i] 


Figure  4,3  A  flow  chart  of  the  parsing  algorithm  (Algorithm  4.3). 
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4.5  Experimental  Results  and  Discussion 


In  this  chapter  we  have  shown  an  attributed  seismic  grammar 
which  has  only  67  productions  and  13  nonterminal  symbols  compared 
to  the  720  productions  and  681  nonterminal  symbols  of  a  nonattributed 
finite-state  grammar.  An  error-correcting  parser  (Algoritm  4.2)  is  also 
proposed  for  this  attributed  grammar.  Since  the  error-correcting 
parser  of  Algorithm  3.2  considered  only  the  substitution  error,  a 
simplified  version  of  Algorithm  4.2  which  ignores  the  length  variation 
can  be  used  to  greatly  increase  the  processing  speed.  This  is  shown  in 
Algorithm  4.4. 

Algorithm  4.4  Top-Down  No-Backtrack  Error-Correcting  Parsing 
Algorithm  for  Attributed  Seismic  Grammar. 

Input:  An  attributed  seismic  grammar  G  =  (VN,VT,P,S)  and  an 
input  string  y  =  6  j62...6m  in  Vf. 

Output:  The  minimum  distance  between  y  and  L(G)  where  only  sub¬ 
stitution  error  is  considered. 

Method: 

(1)  Set  N  =  the  number  of  5 -productions,  min-distance  =  a 
sufficiently  large  number. 

(2)  Set  i  -  1. 

(3)  The  ith  .S-production  has  the  form  Si  -»  A*iAi2  •  •  •  A^,  where 
Mi  is  the  number  of  nonterminals  at  the  right-hand  side  of  the  ith  S- 
production,  A„  e  V^,  1<; 

(4)  Set  dist  =  0,  k  -  1,  l  =  1. 


I 

(5) (a)  If  k  >  j  L{Aip),  then  £=Z  +  1. 

p=i 

(b)  Apply  production  Ai  and  compute  dist  =  dist  + 

S’(aill6j*).  k=k  + 1.  Note  that  there  is  one-to-one  correspondence 
between  A#  and  a^,  e  VT. 

(6)  If  k^m  ,  go  to  step  (5). 

(7)  If  dist  <  min-distanct  then  min-distance  =  dist. 

(8)  i=i  + 1.  If  i^N  go  to  (3);  otherwise  min-distance  is  the  minimum 
distance  between  y  and  L(G).  Exit. 

A  flow  chart  of  this  parsing  algorithm  is  given  in  Figure  4.4.  A  parse 
of  y  can  be  constructed  by  tracing  the  productions  used  in  Step  (3)  and 
(5)(b).  If  the  length  variation  is  to  be  considered  then  the  item  lists  will 
contain  a  large  number  of  items,  and  consequently  the  computation  will 
be  slow.  However,  Algorithm  3.2  is  unable  to  even  consider  the  length 
variation. 

The  recognition  results  and  computation  time  for  recognizing  one 
string  are  given  in  Table  4.1.  While  both  attributed  cfg  and  nonattri- 
buted  fsg  show  91%  correct  recognition,  the  average  computation  time 
for  one  string  is  0.11  second  using  attributed  seismic  grammar  and  is 
2.55  second  using  nonattributed  finite-state  grammar  This  is  because 
the  finite-state  seismic  grammar  has  a  large  number  of  production 
rules  and  nonterminal  symbols.  A  large  table  must  be  maintained  and 
searching  is  very  time-consuming.  Although  a  special-purpose 
hardware  can  be  built  to  speedup  the  computation,  it  is  slow  for  a 
sequential  computer.  Algorithm  4.2  is  also  time-consuming  for  a  gen¬ 
eral  context-free  grammar.  However,  the  seismic  grammar  in  Section 
4.2  is  a  very  special  cfg,  find  the  application  of  the  production  rules  is 


k  =  k  +  1 
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TABLE  4.1 


The  recognition  results,  computation  time, 
and  memory  used  for  seismic  recognition  using 
an  attributed  context-free  grammar  and  a 
nonattributed  finite-state  grammar. 

(Time  is  for  one  string) 


Accurate 

Rate 

Average 

Time 

(sec) 

memory 

used 

(bvtes) 

Attributed 

cfg 

91% 

0.11 

41360 

Non- 

attributed 

fsg 

91% 

2.55 

72804 

very  straightforward.  The  actual  storage  used  in  computer  is  also  given 
in  Table  4.1. 

We  mentioned  earlier  that  substitution,  insertion  and  deletion  of 
terminal  symbols  are  allowed  but  no  substitution,  insertion  or  deletion 
of  nonterminal  symbol  is  permitted.  As  a  matter  of  fact,  substitution  of 
nonterminal  symbols  can  be  attained  in  terms  of  substitution  of  termi¬ 
nal  symbols.  Therefore,  only  insertion  or  deletion  of  nonterminal  sym¬ 
bols  is  not  allowed.  The  reason  is  that  if  the  training  samples  are  well 
selected,  the  grammar  should  be  able  to  recognize  any  reasonable 
strings.  If  the  test  string  needs  insertion  or  deletion  of  nonterminal 
symbols  in  order  to  be  accepted,  it  is  either  severely  distorted  or  miss¬ 
ing  some  string  segments.  If  insertion  and  deletion  of  nonterminal 
symbols  are  to  be  considered  then  this  becomes  a  structural- 
deformation  problem  (Tsai  and  Fu,  1979).  We  can  add  insertion  and 
deletion  error  transformations  in  Step(6)  of  Algoritm  4.2  as  we  did  in 
Algorithm  2.4.  This  will  make  the  algorithm  more  complicated.  A  dis¬ 
tance  threshold  should  be  imposed  to  eliminate  unrealistic  parses  so 
that  the  item  list  will  not  become  unmanagable. 
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CHAPTER  V 

VLSI  ARCHITECTURES  FOR  SYNTACTIC 
SEISMIC  PATTERN  RECOGNITION 

5.1  Introduction 

Some  computational  algorithms,  for  example,  matrix  multiplication 
and  inversion  in  numeric  computation  and  string  matching  in  non- 
numeric  computation,  are  very  time-consuming  so  lhat  an  efficient 
implementation  is  usually  not  feasible  and  economical.  However,  this 
situation  has  been  changed  due  to  the  advances  in  hardware  technol¬ 
ogy,  i.e.,  the  development  of  high-speed,  high-density  and  low-cost  elec¬ 
tronic  devices.  Hardware  implementation  (particularly  parallel  and/or 
pipeline  processing)  of  software  algorithm  has  become  an  affordable 
solution  to  increase  the  processing  speed  because  the  cost  of  hardware 
is  decreasing.  Advance  in  VLSI  technology  makes  it  possible  to  pack 
more  components  into  one  chip  at  a  lower  price  than  ever  before  (Mead 
and  Conway,  1980).  This  revolutionary  impact  stimulates  considerable 
interest  to  develop  parallel  algorithms  for  VLSI  implementation  and 
build  special-purpose  chips  for  specific  applications  (Kung,  1979,  1980). 
A  whole  book  (Bowen  and  Brown,  1982)  has  been  devoted  to  VLSI  sys¬ 
tems  design  for  digital  signal  processing.  Many  computers  and  proces¬ 
sors  have  been  developed  for  signal  processing.  The  recent  trend  is  to 
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use  attached  signal  processors,  e.g.,  Lincoln  Laboratory  Fast  Digital 
Processor  (FDP)  and  Data  Genaral  AP/130  array  processor,  instead  of 
supercomputers  as  ILLIAC-IV  and  Advanced  Scientific  Processor  (ASC) 
(Bowen  and  Brown,  1982).  More  specialized  applications  for  matrix  mul¬ 
tiplication,  convolution  and  solving  linear  equations  can  be  found  in 
Kung  (1979,  1982),  Kulkarni  and  Yen  (1982),  Hwang  and  Cheng  (1981).  A 
recent  example  of  special-purpose  VLSI  architecture  is  an  integrated 
multiprocessing  array  for  time  warping  pattern  matching  which  is  used 
in  speech  recognition  (Ackland,  Weste  and  Burr,  1981).  Pattern  match¬ 
ing  is  the  most  time-consuming  stage  in  speech  recognition  especially 
when  the  dictionary  is  large.  Using  parallel  processing  improves  the 
speed  200  times  faster,  therefore  make  the  real-time  application  possi¬ 
ble. 

Like  dynamic  time  warping,  all  the  string  distance  computation  and 
string  matching  are  time  consuming.  Hardware  implementation  has 
been  proposed  by  Okuda,  Tanaka  and  Kasai  (1976)  for  computing 
Levenshtein  distance  even  before  VLSI  technology  is  available.  They 
used  delay  circuits  to  implement  insertion,  deletion  and  substitution 
weights. 

We  propose  in  this  chapter  a  VLSI  architecture  for  seismic 
classification  using  syntactic  approach,  which  includes  feature  extrac¬ 
tion,  primitive  recognition  and  string  matching.  Our  string  matching 
implementation  is  more  complicated  than  Okuda,  et  al.'s,  where 
different  weights  are  assigned  to  different  symbols  in  our  case.  This 
special-purpose  processor  is  designed  to  be  attachec.  to  a  host  com¬ 
puter,  for  example,  a  minicomputer  as  shown  in  Figure  5.1,  therefore  it 
works  like  a  perihperal  processor.  Three  systolic  arrays  are  proposed 
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Figure  5.1  The  special-purpose  processor  is  attached  to  a  host  computer 
as  a  peripheral  processor. 
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to  perform  feature  extraction,  primitive  recognition  and  string  match¬ 
ing  respectively.  Several  memory  units  are  required  for  holding  the 
intermediate  results  and  for  data  setup.  Figure  5.2  shows  the  architec¬ 
ture  of  our  special-purpose  processor.  All  these  three  systolic  arrays 
perform  in  time  0(1),  i.e.,  results  can  be  produced  at  a  constant  rate 
provided  that  input  data  are  supplied  properly  in  a  pipelined  fashion. 
The  formations  of  input  data  are  given  in  Figure  5.3  where  (a)  is  for 
feature  extraction,  ( b )  is  for  primitive  recognition  and  (c)  is  for  string 
matching.  Section  5.2  discusses  VLSI  architectures  for  feature  extrac¬ 
tion.  Section  5.3  discusses  VLSI  architectures  for  primitive  recognition. 
Section  5.4  discusses  VLSI  architectures  for  string  matching.  Section 
5.5  shows  some  simulation  results  and  performance  verification.  Sec¬ 
tion  5.6  gives  the  concluding  remarks. 


5.2  VLSI  Architectures  for  Feature  Extraction 

The  systolic  array  for  feature  extraction  is  linearly  connected  as 
shown  in  Figure  5.4.  The  input  data,  which  are  the  digitized  and  quan¬ 
tized  signal  waveform  coded  in  binary  form,  are  stored  in  seperate 
memeory  modules  in  a  skewed  format  as  shown  in  Figure  5.3(a)  and 
Figure  5.4(a).  Each  memory  module  is  delay  by  one  unit  time,  i.e., 
time  required  to  process  one  data  element,  from  left  to  right.  Each 
memory  module  contains  a  sequence  of  words,  i.e.,  discrete  signal 
points  and  is  connected  to  a  processing  element  (PE)  of  the  systolic 
array.  The  data  are  transferred  into  the  PE's  bit  b>  bit,  and  all  the 
memory  modules  are  read  parallelly.  Two  features,  zero-crossing  count 
and  sum  of  absolute  magnitudes  are  computed.  Absolute  sum  instead 
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of  log  energy  is  used  here  for  the  simplicity  of  implementation.  Loga¬ 
rithmic  function  can  be  approximated  by  taking  a  series  expansion  (see 
Ackland,  et  al.,  1981).  Zero-crossing  is  detected  by  checking  the  signs 
of  every  two  consecutive  points.  Any  sign  change  is  counted  as  one 
zero-crossing.  An  exclusive-OR  circuit  is  used  for  detection  of  sign 
change.  Figure  5.4(6)  shows  the  operation  of  each  PE.  The  internal 
structures  are  given  in  Figure  5.5.  All  the  n  PE’s  compute  the  two 
features  simultaneously  and  pass  the  partial  results  to  the  next  PE's. 
Each  general-purpose  register  A,  B,  C,  E  and  S  is  16-bit  long.  The 
micro-operations  of  each  PE  are  as  follows. 

(1)  (a)  Transfer  (serially)  input  data  into  Register  A  from  external 
storage. 

(b)  Transfer  (serially)  input  data  into  Register  B  from  Register  A  of 
the  left  PE. 

(c)  Transfer  (serially)  partial  result  into  Register  C  from  Register  C 
of  the  left  PE. 

(d)  Transfer  (serially)  partial  result  into  Register  S  from  Register  S 
of  the  left  PE. 

(e)  C  «-  C  +  (sgn(A)  +  syn(B)). 

(2)  E  -  |A|. 

(3)  S  «-  S  +  E. 

Steps  (l)(a)  to  (l)(e)  can  be  executed  in  parallel,  therefore  can  be 
completed  in  16  machine  cycles.  Step  (2)  and  step  (3)  can  each  be 
completed  in  one  machine  cycle.  The  entire  operations  (1),  (2)  and  (3) 
take  18  machine  cycles  to  complete.  The  time  for  each  processor  to 
complete  its  entire  operations,  i.e.  18  machine  cycles  here,  is  call  a  unit 


Figure  5.5  The  internal  structure  of  the  processor  for  feature  extrac¬ 
tion. 


time.  Although  memory  cycle  is  slower  than  machine  (procesor)  cycle, 
each  memory  fetch  can  take  time  as  long  as  18  machine  cycles.  There¬ 
fore  data  input  can  keep  up  with  processor  speed.  Suppose  that  input 
data  are  fed  in  properly,  then  after  n  unit  times,  where  n  is  the 
number  of  data  points  in  one  segment,  the  feature  of  the  first  segment 
will  emerge  from  the  end  of  the  systolic  array.  There  will  be  a  set  of 
features  (of  one  segment)  coming  out  every  unit  time  thereafter. 
Therefore  with  the  systolic  array  reaching  steady  state,  each  segment 
only  takes  1  unit  times,  i.e.,  18  machine  cycles,  to  complete  the  compu¬ 
tation.  With  a  uniprocessor,  each  segment  will  take  0(tl )  computations 
and  comparisons.  The  speedup  is  n,  which  is  equal  to  the  number  of 
processors. 


5.3  VLSI  Architectures  for  Primitive  Recognition 

In  the  primitive  recognition  problem,  we  compute  the  distance 
between  the  unknown  feature  vector  and  the  reference  vector,  for 
example,  mean  vector,  of  each  cluster  (primitive),  and  then  assign  the 
unknown  feature  vector  to  the  cluster  of  the  minimum  distance.  This 
procedure  can  be  divided  into  two  steps;  first,  compute  the  distances 
between  the  unknown  vector  and  the  reference  vectors,  and  then  select 
the  smallest  distance.  We  use  a  processor  array,  which  contains  ’com¬ 
pute’  processors,  for  distance  computation  and  a  processor  array, 
which  contains  Suppose  there  are  l  primitives;  each  primitive  i  has  a 
reference  feature  vector  [mlj ,  . m,£]  where  k  is  the  total  number 


of  features.  A  processor  array  of  L  by  k  which  performs  the  distance 
computation  is  shown  in  Figure  5.6.  The  reference  vectors  of  the 


primitives  enter  from  the  bottom  and  move  up  while  the  unknown 
feature  vectors  enter  from  the  top  and  move  down.  The  partial  sums 
move  from  left  to  right.  The  data  must  be  properly  skewed  as  shown  in 
Figure  5.6  and  Figure  5.3(6).  Since  the  two  data  streams  move  in  oppo¬ 
site  direction,  they  must  be  separated  by  one  unit  time  which  is  shown 
by  one  space  in  Figure  5.6;  otherwise,  some  data  will  just  pass  instead 
of  meeting  each  other. 

The  unknown  feature  vectors  are  assumed  to  come  in  continuously. 
The  reference  vectors  must  also  repeat  their  cycles  continuously,  i.e., 
with  the  first  primitive  vector  coming  right  after  the  ith  primitive  vec¬ 
tor.  After  initiation,  the  feature  vectors  will  be  delayed  for  l—  1  unit 
times  so  that  the  first  feature  vector  and  the  first  primitive  vector  will 
meet  at  the  first  row  of  the  processor  array.  The  sum,  which  is  equal  to 
zero  initially,  will  be  the  distance  at  the  end  of  computation.  The  func¬ 
tional  diagram  of  each  'compute'  processor  is  shown  in  Figure  5.7(a), 
where  x  is  a  component  of  the  unknown  feature  vector,  u  is  a  com¬ 
ponent  of  the  primitive  vector  and  a  is  the  partial  sum.  For  simplicity, 
we  use  the  absolute-value  distance  here.  Euclidean  distance  computa¬ 
tion  will  take  more  space  and  time. 

The  internal  structure  and  data  movement  are  shown  in  Figure 
5.8(a).  Each  ’compute’  processor  contains  an  arithmetic  and  logic  unit 
(ALU),  and  four  16-bit  registers  A,  B,  U  and  X.  The  micro-operations  are 
shown  as  follows. 

(1)  (a)  Transfer  data  (serially)  into  register  X  from  the  above  PE. 


(b)  Transfer  data  (serially)  into  register  U  from  the  lower  PE. 

(c)  Transfer  partial  sum  (serially)  into  register  A  from  the  left  PE. 

(2)  B  «-  X  -  U. 

(3)  B  -  |B|. 

(4)  A  «-  A  +  B. 

Step  (1)  takes  16  clock  cycles  to  transfer  one  word  of  16  bits;  step  (2), 
(3)  and  (4)  takes  1  clock  cycle  each.  The  entire  operations  take  19 
clock  cycles.  The  unit  time  here  is  19  clock  cycles. 

After  computation  of  the  corresponding  components  between  the 
reference  vector  and  the  unknown  feature  vector,  the  partial  sum 
moves  to  the  right.  When  the  partial  sum  passes  the  fcth  processor  of 
the  first  row,  the  output  will  be  the  distance  between  vectors  [i’ ,  xf  , 

....  xf]  and  [mi1,  m2' . mf],  then  it  enters  the  rightmost  column  of 

processors,  which  are  the  ’compare’  processors.  Since  the  data 
streams  are  seperated  by  one  unit  time,  the  processors  on  alternate 
diagonals  (from  lower  left  to  upper  right)  are  idle.  When  vector  [*j, 
xf  ,  ....  xf]  enters  the  second  row  of  the  processor  array,  it  will  meet 
vector  [mf ,  mf ,  ....  m*2].  When  vector  [xf,  xf  ,  ....  xf]  enters  the  third 

row,  it  will  meet  vector  [mf,  mf . to*3];  meanwhile,  vector  [xf ,  xf , 

...,  x*2]  will  meet  vector  [mf ,  mf . mf]  at  row  one.  We  can  see  from 

the  above  and  Figure  5.6  that  vector  [xf,  xf ,  ....  xf]  is  compared  with 

the  reference  vectors  in  the  sequence  1,  2,  ...,  I,  vector  [xf,  xf . xf] 

is  compared  with  the  reference  vectors  in  the  sequence  2,  3 . i,  1, 

and  so  forth.  These  operations  are  overlapped,  i.e.,  pipelined,  in  a  way 
that  every  processor  is  doing  part  of  the  computation  and  pass  the  data 
and  results  to  the  neighbor  processors. 


The  functional  diagram  of  the  ’compare’  processor  is  shown  in  Fig¬ 
ure  5.7(b)  where  a  is  the  minimum  distance  computed  so  far  with  prim¬ 
itive  identifier  c ,  b  is  the  distance  just  computed  and  d  is  the 
corresponding  primitive  identifier  input  externally.  The  internal  struc¬ 
ture  and  data  movement  are  shown  in  Figure  5.8(b).  Each  ’compare’ 
processor  contains  an  ALU,  two  8-bit  registers  B,  D  and  two  16-bit  regis¬ 
ters  A,  C.  The  micro-operations  are  as  follows. 

(1)  (a)  Transfer  partial  sum  (serially)  into  register  A  from  C  of  the 
above  PE. 

(b)  Transfer  partial  sum  (serially)  into  register  C  from  the  left  PE. 

(c)  Transfer  primitive  identifier  (serially)  into  register  B  from  D  of 
the  above  PE. 

(d)  Transfer  primitive  identifier  (serially)  into  register  D  from 
external  input. 

(2)  E  -  A-C. 

(3)  If  a  <  c  then  \C  «-  A;  D  «-  B{. 

Step  (1)  takes  16  cycles  to  complete,  step  (2)  takes  1  and  step  (3) 
takes  1.  These  three  steps  take  18  cycles,  which  is  1  cycle  shorter  than 
the  ’compute’  processor,  therefore  the  ’compare’  processor  must  be 
idle  for  one  cycle  in  order  to  synchronize  with  the  ’compute’  processor. 
The  'compare'  processors  compare  the  current  distance  coming  from 
the  left  with  the  distance  coming  from  the  above,  and  pass  the  smaller 
one  to  the  lower  processor.  Primitive  identifiers  are  fed  in  from  the 
right  in  a  similar  format  as  those  for  data  streams.  The  identifier 
streams  should  be  delayed  for  L+lc—l  unit  times  so  that  the  first 
identifier  enters  the  first  ’compare’  processor  at  the  same  time  as 


Figure  5.7  Data  flow  and  operations  of  each  (a)  ’compute'  processor  and 
(b)  ’compare’  processor. 


Figure  5.8  Internal  structure  and  register  transfer  of  (a)  ’compute’  and 
(l'  ’compare’  processors. 
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the  distance  between  [sj  ,  Zg  ,  ....  z*1]  and  [m,1,  mj,  m*.1].  In  order 
to  assign  right  identifier  to  right  distance,  the  identifier  streams  must 
be  arranged  as  shown  in  Figure  5.6. 

With  a  uniprocessor,  the  primitive  recognition  procedure  of  one 
feature  vector  will  take  Ixk  computations  and  £  —  1  comparisons  in  our 
present  example.  With  the  processor  array  of  Figure  5.6,  the  primitive 
recognition  procedure  of  a  single  feature  vector  needs  lxfc  +  l  unit 
times.  However,  a  processor  array  is  not  designed  for  the  processing  of 
one  single  datum,  instead,  it  is  for  a  stream  of  data.  In  that  case,  a  new 
result  will  come  out  every  2  unit  times  in  Figure  5.6.  Given  L  reference 
vectors  and  a  feature  vectors  of  dimension  k ,  the  array  processor  will 
take  2  unit  times  to  get  one  result  in  steady  state,  while  a  uniprocessor 
takes  O(lx-k)  time  to  complete  the  computation.  The  speedup  is 
£xfc/2.  In  Figure  5.6,  the  results  contain  both  the  minimum  distance 
and  the  primitive  identifier,  therefore  no  other  processing  is  required. 

Primitive  recognizer  is  essentially  a  vector  pattern  matcher. 
Therefore  it  can  be  used  in  many  other  applications,  and  can  be  used 
indepent  of  feature  extraction  and  string  matching. 

5.4  VLSI  Architectures  for  String  Matching 
Based  on  Levenshtein  Distance 

Nonnumeric  computation  has  become  more  important  and 
demanded  more  hardware  algorithms,  i.e.,  algorithms  specially 
designed  for  hardware  implementations,  and  architectures  recently 
due  to  the  increasing  applications  in  artificial  intelligence,  database, 
information  retrieval,  language  translation,  pattern  recognition,  etc., 


One  of  the  most  important  categories  in  nonnumeric  computation  is 
string  pattern  matching.  Character  string  matching  is  very  important 
in  information  retrieval  and  dictionary  look  up  (Hall  and  Dowling,  1980). 
The  problem  of  string  pattern  matching  can  generally  be  classified  into 
two  kinds.  We  call  them  exact  matching  and  approximate  matching. 
For  exact  matcing,  a  single  string  is  matched  against  a  set  of  strings, 
usually  this  particular  string  is  embeded  as  a  substring  of  the  reference 
strings.  Hardware  algorithms  for  exact  matching  has  been  proposed  by 
Mukhopadhyay  (1979),  where  the  test  pattern  resides  in  an  array  of 
cells  and  the  reference  text  is  broadcasted  to  all  the  cells  simultane¬ 
ously  character  by  character.  Foster  and  Kung  (1980)  designed  a  VLSI 
chip  for  exact  pattern  matching  with  wild  card  capability,  where  the 
test  pattern  enters  from  one  end  and  the  reference  text  enters  from 
the  other  end  of  the  linear  array.  By  constrast,  for  approximate 
matching,  we  want  to  find  a  string  from  a  finite  set  of  strings  which 
approximately  matches  the  test  string.  Certainly  we  will  also  find  the 
string  which  exactly  matches  the  test  string  if  it  does  exist.  A  good  sur¬ 
vey  of  approximate  string  matching  can  be  found  in  Hall  and  Dowling 
(1980).  This  section  concentrates  exclusively  on  approximate  match¬ 
ing.  Approximate  string  matching  is  based  on  the  idea  of  insertion, 
deletion  and  substitution  of  terminal  symbols.  An  application  example 
of  approximate  string  matching  which  cannot  be  performed  by  exact 
string  matching  is  the  string  clustering  problems,  for  example,  in  Lu 
and  Fu  (1978).  Wagner  and  Fischer  (1974)  proposed  a  dynamic  pro¬ 
gramming  method  for  the  computation.  Okuda,  Tetnaka  and  Kasai 
(1976)  proposed  an  algorithm  and  hardware  implementation  for  garbled 
word  recognition  based  on  the  Levenshtein  Metric.  We  propose  in  this 
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section  a  VLSI  architecture  for  approximate  string  matching.  The  dis¬ 
tance  measure  is  (weighted)  Levenshtein  distance  using  dynamic  pro¬ 
gramming  method.  Although  it  is  using  the  minimum-distance  cri¬ 
terion  in  deterministic  cases;  it  can  be  easily  modified  to  the 
maximum-likelihood  criterion  in  probabilistic  cases. 

Chiang  and  Fu  (1979)  studied  several  parallel  architectures, 
namely,  SIMD,  dedicated  SIMD  and  MIMD,  for  string  and  tree  distance 
computation.  Each  node  on  the  same  diagonal  of  the  dynamic  program¬ 
ming  matrix  is  computed  simultaneously.  The  time  complexity  of  these 
specific  parallel  systems  is  0(n+m),  where  n  and  m  are  the  lengthes 
of  the  two  strings  under  comparison.  Our  system,  differs  from  theirs  in 
that  we  use  a  systolic  array,  i.e.,  a  square  array  of  PE’s  as  in  Ackland,  et 
al.  (1981)  and  pipelined  data  flow  for  the  computation.  Therefore  we 
can  obtain  the  results  at  a  constant  rate,  i.e.,  one  result  after  each  unit 
time. 

It  is  well-known  that  Levenshtein  distance  can  be  computed  by 
dynamic  programming.  Therefore,  it  can  be  implemented  by  parallel 
processing  on  VLSI  architectures.  In  this  case,  parallel  computation 
and  pipeline  data  flow  are  combined  to  process  continuously  a  large 
amount  of  data  at  a  very  high  speed.  The  dynamic  programming  algo¬ 
rithm  recursively  computes  the  optimal  path  from  point  (1,1)  to  (m,n) 
based  on  its  subpaths.  In  dynamic  time  warping,  there  are  many  slope 
constraints  for  selecting  subpaths.  Ackland  et  al.  (1981)  chose  the  sim¬ 
plest  constraint,  i.e., 

^i  —  \,j  —  1  j  -l>Si  j  —  1 


S-ij  =  Dij  +  min 
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where  Diti  =  I -yi  \ ,  xi ,  yj  are  feature  vectors,  Sij  is  partial  sum  at 
point  It  will  be  much  difficult  to  implement  if  they  chose  other 

slope  constraints. 


5.4.1  Levenshtein  Distance 


For  Levenshtein  distance,  there  are  also  many  variations.  The  ori¬ 
ginal  Levenshtein  distance  where  each  insertion,  deletion  and  substitu¬ 
tion  is  counted  as  one  error  transformation  is  the  easist  to  implement. 
We  have  developed  a  processor  array  for  this  computation.  A  portion  of 
the  dynamic  programming  diagram  and  its  corresponding  processor 
array  is  given  in  Figure  5.9.  Each  processor  computes  the  partial  sum 


Si  j  -  min 


Si-xj  + 1 
Sij- i+l 


where  S’Coi.bj)  =  1  if  a*  £  6^-;  5(^,6^)  =  0  otherwise.  The  computation 
can  be  divided  into  three  stages.  The  procedures  are  as  follows. 

Stage  1 

(1)  (a)  Transfer  (serially)  partial  sum  into  D  frcm  the  lower  PE. 

(b)  Transfer  (serially)  primitive  a*  into  X  from  the  lower  PE. 

(c)  Transfer  (serially)  primitive  bj  into  Y  from  the  left  PE. 

(d)  Compare  (serially)  X  with  Y;  output  V=0ifX=Y,  V  =  1  other- 


4 

4 


wise. 
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(2)  D  «-  D  +  V. 

Stage  2 

(1)  (a)  Transfer  (serially)  partial  sum  into  B  from  the  left  PE. 

(b)  Transfer  (serially)  partial  sum  Sij.j  into  C  from  the  lower  PE. 

(c)  Send  (serially)  partial  sum  to  D  of  the  above  PE. 

(d)  Compare  (serially)  B  with  C,  A  «-  min(B,  C). 

(e)  Send  (serailly)  contents  of  X  to  X  of  the  above  PE. 

(f)  Send  (serially)  contents  of  Y  to  Y  of  the  right  PE. 

(2)  A  «-  A  +  1. 

(3)  Compare  (parallelly)  A  with  D,  R  ♦-  min(A,  D). 

Stage  3 

(1)  (a)  Send  (serially)  partial  sum  R  to  B  of  the  left  PE. 

(b)  Send  (serially)  partial  sum  R  to  C  of  the  above  PE. 

Stage  1  takes  17  clock  cycles  to  complete  (16  for  step  ( 1)  and  1  for  step 

(2) );  stage  2  takes  18  (16  for  step  (1).  1  for  step  (2)  and  1  for  step  (3)), 
and  stage  3  takes  16.  Figure  5.10  shows  the  internal  structure  and  the 
operations  of  processor  element  Pij  at  stage  1,  2  and  3.  Each  PE  con¬ 
tains  a  set  of  registers,  an  ALU,  a  control  unit  and  some  other  combina¬ 
tional  logic.  Registers  A,  B,  C,  D,  V  and  R  are  general-purpose  registers 
which  are  16-bit  long  and  connected  to  the  ALU.  Registers  X  and  Y  are 
8-bit  long,  which  are  used  to  store  primitives.  In  our  seismic  case,  we 
have  13  primitives;  therefore,  4  bits  should  be  enough  to  represent 
them.  In  fact,  4  bits,  which  have  16  combinations,  should  be  sufficient 
for  many  practical  applications.  However,  in  order  to  make  our  system 
more  flexible  and  compatible  with  other  systems  which  use  ASCII  code, 


we  let  registers  X  and  Y  hold  8  bits.  This  generalization  will  be  able  to 
recognize  character  strings  where  each  character  is  in  ASCII  code,  for 
example,  A  =  '01000001*.  B  =  '01000010',  C  =  ’01000011’,  and  so  forth. 

Figure  5.11  shows  the  data  movement  between  4  neighboring  PE’s 
shown  in  Figure  5.9.  All  the  processors  at  the  same  diagonal  performs 
the  same  computation  as  shown  in  Figure  5.11  and  5.12(a).  This  format 
will  move  forward  one  step  every  18  clock  cycles.  Since  each  string 
only  needs  three  diagonals  at  any  time,  the  other  processors  can  be 
used  to  compute  distances  of  other  strings.  Therefore,  data  flow  can  be 
pipelined  as  shown  in  Figure  5.12(b).  If  we  are  matching  a  test  string 
against  a  number  of  reference  strings,  the  distance  between  the  test 
string  and  the  first  reference  string  will  emerge  after  pxlB  clock 
cycles,  where  p  is  the  number  of  diagonals  in  the  array.  After  that, 
there  will  be  one  string  distance  coming  out  every  3x18  =  54  clock 
cycles.  Since  stage  1  and  3  have  no  conflict,  they  can  be  ovelapped,  i.e., 
one  diagonal  of  the  array  can  be  used  to  perform  stage  3  of  one  string 
and  stage  1  of  the  next  string  at  the  same  time,  to  increase  the 
throughput. 

The  structure  of  processor  array  and  data  flow  are  shown  in  Figure 
5.13.  The  reference  strings  enter  from  the  left;  the  test  string  enters 
from  the  bottom.  The  test  string  must  repeat  itself  continuously  in 
order  to  compare  with  all  the  reference  strings.  Both  test  and  refer¬ 
ence  strings  must  be  properly  skewed  and  separated  as  shown  in  Figure 
5.13  so  that  they  will  arrive  at  the  right  processors  at  the  right  time. 
The  bookkeeping  and  selection  of  minimum  distance  can  be  done  by  a 
special-purpose  processor  or  the  host  computer.  One:  practical  prob¬ 
lem  is  about  the  dimension  of  the  processor  array.  The  number  of  rows 


.11  Data  movement  between  PE’s. 


Figure  5.12  Processors  at  the  same  diagonal  perform  the  same  opera¬ 
tion;  three  diagonals  are  required  for  one  string  (a),  and  strings  can  be 
pipelined  (b). 
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can  be  set  to  the  maximum  length  of  the  reference  strings.  Since  the 
length  of  the  test  string  is  unknown,  the  number  of  column  can  be  set 
arbitrarily.  If  a  test  string  exceeds  the  array  size,  it  should  be  handled 
by  the  host  computer  or  preprocessor.  Because  the  interruption  of  the 
regular  computation  pattern  in  a  VLSI  array  will  greatly  reduce  its 
efficiency.  This  situation  can  be  kept  to  minimum  by  selecting  a 
reasonablly  large  array  size.  A  shorter  string  will  be  padded  out  with 
blank  to  make  it  equal  to  the  array  dimension. 

Suppose  both  the  reference  and  the  test  strings  have  length  L. 
With  a  uniprocessor,  the  matching  process  for  one  unknown  string  will 
take  O(Lxl)  unit  operations.  With  the  array  processor,  it  only  takes  3 
unit  times. 


5.4.2  Weighted  Levenshtein  Distances 

Since  a  weighted  Levenshtein  distance  is  usually  more  favorable  in 
practical  application,  we  now  propose  a  VLSI  architec.;ore  for  its  com¬ 
putation.  The  major  problem  here  is  to  store  all  the  weights  in  each 
processor,  which  must  be  easy  to  implement  and  fast  for  access.  For¬ 
tunately,  a  programmable  logic  array  (PLA)  can  be  used  (Mead  and  Con¬ 
way,  1980).  It  is  a  special  type  of  read-only  memory,  and  easy  to  imple¬ 
ment  in  a  VLSI  system.  A  simple  example  will  illustrate  how  a  PLA 
works.  Figure  5.14  shows  a  simple  weights  table  and  its  PLA  implemen¬ 
tation.  A  PLA  consists  of  two  parts,  the  left  part  is  called  the  AND 
plane,  the  right  part  is  called  the  OR  plane.  Input  lines  A,  B  have  the 
combinations  (0,0),  (0,1),  (1,0),  (1,1)  which  represent  the  entries  of  the 
weight  table.  The  output  XYZ  indicate  the  values  of  the  entries,  which 
range  from  0  to  7.  The  circles  indicate  connections.  Since  we  only  have 


13  primitives,  4  bits  will  be  enough  for  discrimination.  We  take  the  4 
least-significant  bits  (LSB)  from  the  primitives  for  our  internal  compu¬ 
tation,  for  example,  a  =  ’0001',  fa  =  '0010',  c  =  '0011',  and  so  forth.  We 
need  more  bits  for  recognition  of  character  strings.  Figure  5.15  shows 
the  PLA  implementation  of  weight  table  for  substitution,  insertion  and 
deletion  in  our  seismic  case.  There  is  an  input  register  to  the  AND- 
plane  and  an  output  register  from  the  OR-plane;  both  are  8-bit  long. 
Register  X  contains  primitive  'a! ,  and  register  Y  contains  primitive  'fa' ; 
(a,  fa)  is  the  entry  of  the  weight  table.  Here  the  symbols  X,  Y,  A  and  B 
are  registers  which  should  not  be  confused  with  those  in  Figure  5.14. 
The  pair  (X  =  a,  Y  =  fa)  represents  the  substitution  of  'b'  for  'a'.  The 
pair  (X  =  a,  Y  =  0000)  represents  the  deletion  of  'a' .  The  pair  (X  =  0000, 
Y  =  fa )  means  the  insertion  of  '6' .  The  access  time  is  very  fast,  only  two 
clock  cycles;  one  is  needed  for  input  register,  the  other  is  for  output 
register. 

Except  for  the  weight  table,  the  computation  procedure  is  similar 
to  the  previous  one.  The  internal  structure  of  the  PE’s  is  given  in  Fig¬ 
ure  5.16.  Each  PE  has  an  ALU,  a  PLA  (with  registers  Q  and  S),  a  control 
unit,  two  8-bit  registers  X,  Y  and  three  16-bit  registers  B,  C,  and  D. 
Register  Z  contains  constant  ’0000’  as  symbol  The  data  movement  is 
similar  to  that  in  Figure  5.11. 

Stage  1 

(1)  (a)  Transfer  (serially)  partial  sum  jj.j  into  D  from  the  lower  PE. 

(b)  Transfer  (serially)  primitive  a*  into  X  from  the  lower  PE. 
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(c)  Transfer  (serially)  primitive  bj  into  Y  from  the  left  PE. 

(2)  Load  (parallelly)  the  4  LSB  of  X  and  Y  into  Q,  output  5(^,6^-)  in  S. 

(3)  Compute  D  «-  D  +  S. 

(4)  Load  (parallelly)  the  4  LSB  of  X  and  Z  into  Q,  output  D{oi)  in  S. 

Stage  2 

(1)  (a)  Transfer  (serially)  partial  sum  Si^j  into  B  from  the  left  PE. 

(b)  Transfer  (serially)  partial  sum  into  C  from  the  lower  PE. 

(c)  Send  (serially)  partial  sum  to  D  of  the  above  PE. 

(d)  Send  (serailly)  contents  of  X  to  X  of  the  above  PE. 

(e)  Send  (serially)  contents  of  Y  to  Y  of  the  right  PE. 

(2)  (a)  Compute  B  «-  B  +  S. 

(b)  Load  (parallelly)  the  4  LSB  of  Z  and  Y  into  Q,  output  I(bj)  in  S. 

(3)  (a)  Compute  C  «-  C  +  S. 

(b)  Compute  B  «-  min(B,  D). 

(4)  Compute  D  «-  min(B,  C). 

Stage  3 

(1)  (a)  Send  (serially)  partial  sum  in  D  to  B  of  the  right  PE. 

(b)  Send  (serially)  partial  sum  in  D  to  C  of  the  above  PE. 

In  Stage  1,  Step  (1)  takes  16  cycles  ((a),  (b)  and  (c)  operate  in  parallel). 
Step  (2)  takes  3  cycles  (1  for  loading,  2  for  PLA  reading),  step  (3)  takes 
1  cycle  and  Step  (4)  takes  3  cycles  (same  as  Step  (2)).  In  Stage  2,  Step 
(1)  takes  16  cycles,  Step  (2)  takes  3  cycles  ((a),  (b)  operate  in  parallel), 
Step  (3)  takes  2  cycles  and  Step  (4)  takes  2  cycles.  Stage  3  takes  16 
cycles  ((a)  and  (b)  both  take  16  cycles  and  can  be  executed  in  parallel). 


Therefore,  Stage  1  takes  23  cycles,  Stage  2  takes  23  cycles,  and  Stage  3 
takes  16  cycles.  As  usual,  stage  3  can  be  overlapped  with  stage  1  to 
save  processing  time.  Due  to  the  weight  computation,  this  system 
takes  longer  time  than  the  previous  one. 


5.5  Simulations  and  Performance  Verification 

Simulations  have  been  performed  for  the  three  systolic  arrays: 
feature  extraction  array,  primitive  recognition  array  and  string  match¬ 
ing  array.  The  flow  charts  for  the  simulations  are  given  in  Appendix  A. 
The  same  seismic  data  as  those  used  in  Section  3.5  are  tested  in  the 
simulations.  The  results  of  the  simulations  are  exactly  the  same  as 
those  of  the  sequential  computer  in  Section  3.5.  Therefore  the  design 
of  the  systolic  arrays  are  correct  and  the  operations  are  as  expected. 
Step-by-step  simulation  results  using  sample  seismic  waveforms  are 
given  in  Appendix  B.  The  computation  time  in  our  simulation  is  shown 
in  Table  5.1.  The  computation  time  using  a  sequential  computer  is  also 
given  for  comparison,  it  is  noted  that  the  listed  comp  jtation  time  is  an 
average  and  approximate  time  which  should  be  used  lor  comparison 
only.  Suppose  that  we  are  dealing  with  a  large  amout.  of  data.  Similar 
to  the  definition  of  speedup  for  multioperation  computer  in  Kuck 
(1978),  we  define  the  theoretical  speedup  (TS)  of  a  (systolic)  processor 
array  as 


time  interval  between  consecutive 
results  using  a  sequential  computer 
time  interval  between  consecutive 
results  using  a  processor  array 


Therefore  the  TS  for  feature  extraction  is  60/1  =  60,  for  primitive 
recognition  is  39/2  =  19.5  and  for  string  matching  is  20/3  =  6.67.  The 
numerators  are  the  numbers  of  operations  for  getting  one  result  using 
a  sequential  computer,  and  the  denominators  are  the  time  intervals 
between  consecutive  results  for  VLSI  arrays  as  shown  in  the  previous 
sections.  Note  that  the  TS  for  string  matching  in  our  experiment  is 
20/3  =  6.67  instead  of  20x20/3  =  133.  This  is  because  we  only  consider 
substitution  errors,  therefore  the  number  of  operations  is  proportional 
to  string  length,  i.e.,  20.  If  insertion  and  deletion  errors  are  to  be  con¬ 
sidered,  then  the  whole  dynamic  programming  matrix  as  shown  in  Fig¬ 
ure  2.1  should  be  considered.  In  the  seismic  recognition  problem  the 
size  of  the  matrix  is  20x20. 

The  real  speedup  in  our  simulations  for  feature  extraction  is 
approximately  the  same  as  the  maximum  theoretical  speedup.  This  is 
due  to  the  simple  structure  and  data  flow  of  the  linearly  connected  sys¬ 
tolic  array.  The  real  speedup  (17.4)  for  primitive  recognition  is  slightly 
less  than  the  TS  (19.5),  which  is  89%  of  the  TS.  The  reason  for  this  is  the 
increasing  complexity  of  array  structure  and  data  flow.  More  time  is 
spent  on  data  movement.  The  real  speedup  (4.67)  for  string  matching 
is  also  less  than  the  TS  (6.67),  which  is  70%  of  the  TS.  This  is  because 
the  array  structure  and  data  flow  are  even  more  complicated.  The 
increasing  complexity  can  be  seen  from  the  the  designs  in  previous  sec¬ 
tions.  The  theoretical  speedup  is  the  upper  bound  where  the  real 
speedup  in  simulation  is  a  function  of  the  computations  performed  and 
the  underlying  architectures. 

The  simulations  are  performed  on  a  sequential  computer  (VAX 
i  1/780).  In  order  to  compare  the  simulation  results  v.'ith  the  results  in 


Chapter  III  we  use  the  same  high  level  languages  (C,  Fortran  and  Pas¬ 
cal).  Therefore,  there  are  many  overhead  in  language  translation  and 
program  execution.  These  are  some  of  the  reasons  for  low  speedup. 
Another  reason  is  data  movement  which  can  be  performed  in  parallel 
with  the  computation  in  VLSI  arrays,  but  can  not  be  done  in  a  sequen¬ 
tial  computer.  One  can  not  accurately  simulate  the  VLSI  system  even 
using  an  assembly  language.  Since  most  systolic  arrays  are  hardwired, 
i.e.,  unprogrammable,  there  is  no  instruction  decoding  or  memory 
fetch  and  storage  for  each  instruction.  Besides,  the  parallelism  can  not 
be  fully  simulated  on  a  sequential  computer.  The  real  computation 
speeds  of  the  proposed  VLSI  arrays  when  fabricated  should  stay  close  to 
the  analytical  results  as  shown  in  the  previous  sections,  i.e.,  1  unit  time 
for  feature  extraction,  2  unit  times  for  primitive  recognition  and  3  unit 
times  for  string  matcing  using  WLD. 

We  would  like  to  consider  some  problems  about  actual  implementa¬ 
tion  and  give  some  examples  about  the  performance  of  our  proposed 
system.  In  Section  5.2  we  assumed  that  the  length  of  .he  linear  systolic 
array  is  the  same  as  the  number  of  points  in  each  segment.  Although  a 
linear  array  can  be  expanded  easily,  it  is  sometimes  necessary  to  use 
small  array  to  process  data  of  larger  size.  For  example,  in  the  seismic 
recognition  problem,  the  number  of  points  in  each  segment  is  60.  We 
can  use  a  linear  array  consisting  of  60  PE’s,  or  we  can  use  less  PE’s,  for 
example,  20  PE's.  The  implementation  using  20  PE’s  is  shown  in  Figure 
5.17,  where  the  data  points  in  each  segment  are  folded  into  three  rows. 
This  will  take  three  unit  times  to  compute  the  features  for  each  seg¬ 
ment.  Suppose  there  are  20  PE's  with  machine  cycle  200  ns,  then  the 
time  required  for  feature  extraction  of  2,000  segments  (after  it  reaches 
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steady-state)  is  equal  to  3  x  18  x  2000  x  200  ns  =  21.6  ms.  The  time  for 
reaching  steady-state  is  20  unit  times,  which  is  equal  to  20  x  18  cycles 
x  200  ns /cycle  =  72  /us.  Since  the  processing  speed  of  feature  extrac¬ 
tion  (18  cycles)  is  faster  than  that  of  primitive  recognition  (19  cycles), 
the  output  of  the  former  can  be  used  directly  by  the  latter.  Recall  that 
the  input  data  for  primitive  recognition  are  interleaved  by  one  space 
(Figure  5.3(6)).  The  feature  vector  of  the  next  segment  is  not  needed 
until  after  2  x  19  cycles  =  38  cycles.  Therefore  30  PE  s  can  be  used  for 
feature  extraction  and  produce  a  feature  vector  every  2  x  18  cycles  = 
36  cycles.  Because  30  PE’s  take  2  unit  times  to  produce  a  result  and 
each  unit  time  is  equal  to  18  cycles.  These  two  operations  can  be  exe¬ 
cuted  in  parallel  to  save  a  half  of  the  total  processing  time. 

Consider  string  matching  using  Levenshtein  distance,  the  com¬ 
parison  of  one  test  string  with  each  reference  string  takes  3  x  10  x  200 
ns  =  10.8  fx. s.  With  one  hundred  reference  strings,  it  takes  1.08  ms  to 
classify  each  test  string,  and  each  test  string  is  executed  sequentially. 
Using  a  systolic  array  it  is  possible  to  make  real-time  string  matching. 
Our  system  can  match  approximately  90,000  strings  per  second  (10.8 
/j.s  for  one  string). 

We  assume  in  the  previous  discussion  that  all  strings,  test  and 
reference,  have  the  same  length.  This  is  not  true  in  many  other  appli¬ 
cations.  Reference  strings  are  different  in  length;  d  mensions  of  pro¬ 
cessor  array  can  not  fit  exactly  the  string  size.  It  is  required  to  make 
processor  array  larger  than  the  string  size  and  pad  the  string  with 
blank  at  the  end.  If  we  let  the  weight  of  insertion,  deletion  and  substi¬ 
tution  of  blanks  be  zero,  then  we  can  solve  the  problem  of  length 
variety  and  still  maintain  the  regular,  synchronous  ciata  flow  pattern. 
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Long  strings  and  larger  array  size  do  not  degrade  the  steady-state 
throughput.  As  usual,  results  can  be  obtained  every  3x18  =  54  cycles. 
It  only  takes  longer  time,  i.e.,  jdx18  cycles  where  p  is  the  number  of 
diagonals,  to  reach  steady- state.  Usually  the  time  to  reach  steady- 
state  is  negligible  compared  with  the  total  processing  time. 

The  system  bus  as  shown  in  Figure  5.1  is  similar  to  the  Unibus  of 
DEC  PDP-11  (Kuck,  1978).  The  Unibus  has  a  maximum  data  rate  of 
4xi07  bits/sec  operating  in  an  interlocked  way,  i.e.,  the  sender  waits 
until  the  receiver  acknowledges  receipt  of  a  word  before  sending 
another  word.  In  our  experiment,  each  seismic  record  has  1200  points, 
and  each  point  is  coded  into  a  16-bit  binary  number.  Therefore,  each 
seismic  record  needs  16x1200  =  19200  bits  of  storage  It  is  easy  to  see 
that  the  system  bus  can  transmit  one  seismic  record  from  disc  to 
special-purpose  processor  in  0.48  ms.  However,  the  typical  operating 
speed  of  magnetic  disc  is  from  2.4xl05  bits/sec  to  1.2xl07  bits/sec 
(Stone,  1980).  Therefore  the  actual  time  for  sending  a  seismic  record 
from  disc  to  special-purpose  processor  is  from  80  ms  to  1.6  ms.  The 
output  from  the  special-purpose  processor  is  the  classification  results, 
which  use  one  word  (16  bits)  for  each  seismic  record  to  indicate  class 
membership.  The  transmission  time  is  0.4  ju,s  for  one  record. 


5.6  Concluding  Remarks 

We  have  proposed  special-purpose  array  processors  for  seismic  sig¬ 
nal  classification,  which  can  be  attached  to  a  general-purpose  computer 
as  shown  in  Figure  5.1.  The  host  computer  can  retrieve  any  intermedi¬ 
ate  data  from  a  special-purpose  processor  and  store  them  in  its  own 
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storage,  as  ■well  as  send  data  to  any  memory  unit  of  the  special-purpose 
processor.  For  example,  the  host  computer  can  retrieve  and  store  the 
string  representation  of  the  signals  for  display  or  for  later  use.  The 
host  computer  can  also  use  any  one  of  the  systolic  arrays,  for  example, 
feature  extraction  array,  only. 

The  design  correctness  and  speedup  have  been  verified  by  simula¬ 
tions  in  Section  5.5.  From  the  simulation  results  it  is  safe  to  predict 
that  the  real  speedup  of  the  fabricated  VLSI  processor  arrays  will  be 
close  to  the  theoretical  speedup.  Computer-aided  design  has  greatly 
reduce  the  design  cost  (Swerling,  1982).  The  cost/per ‘ormance  ratio  of 
special-purpose  processors  will  eventually  be  justified. 

Recently,  VLSI  architectures  have  been  applied  to  syntactic  pattern 
recognition  and  to  implement  parallel  computation.  Guibas,  et  al. 
(1979)  proposed  two  VLSI  arrays  for  the  implementation  of  combina¬ 
torial  algorithms,  one  is  for  a  subset  of  dynamic  programming  prob¬ 
lems,  i.e.,  optimal  parenthesisation  problems  which  include  context- 
free  language  recognition,  the  other  is  for  transitive  closure  problems 
which  include  finite-state  language  recognition.  Based  on  the  array 
structure  of  Guibas,  et  al.,  Chu  and  Fu  (1981)  proposed  VLSI  architec¬ 
tures  for  finite-state  language  recognition  and  context-free  language 
recognition  using  CYK’s  algorithm.  Chiang  and  Fu  (1982)  also  proposed 
a  VLSI  systems  for  context-free  language  recogniticn  using  Earley's 
algorithm.  Ackland  et  al.  (1981)  developed  a  VLSI  systems  to  imple¬ 
ment  dynamic  time  warping  for  spoken  word  recognition.  Our  string 
matcher  can  be  applied  to  any  problem  where  the  Levcnshtein  distance 
computation  is  required.  It  can  be  used  for  string  matching  in  our 
seismic  recognition,  for  character  string  matching  in  information 
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retrieval  (Hall  and  Dowling,  1980)  or  for  pattern  matching  in  shape 
analysis  if  the  object  can  be  represented  by  a  string,  .for  example, 
using  chain  codes  (see  Fu,  1982).  Our  primitive  recognizer  can  also  be 
applied  to  any  minimum-distance  recognition  problem  and  vector  pat¬ 
tern  matching. 
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CHAPTER  VI 

SUMMARY,  CONCLUSIONS,  AND  RECOMMENDATIONS 

6.1  Summary 

We  have  studied  the  application  of  syntactic  pattern  recognition  to 
seismic  signal  classification  and  proposed  special-purpose  VLSI  archi¬ 
tectures  for  the  implementation.  Our  studies  concentrate  on  the 
waveforms  where  shape  information  is  not  important  or  useful,  like 
seismic  signals.  EEG  and  speech  signals  have  similar  characteristic  as 
seismic  signal.  Chapter  I  defines  the  problem  of  study,  outlines  the 
approach  to  the  problem  and  gives  relevent  literature  survey.  Chapter 
II  discusses  string  similarity  (distance)  measures  and  recognition  pro¬ 
cedures.  String  distances  have  been  classified  into  two  categor.es:  gen¬ 
eral  string  distances  which  are  based  on  the  concept  of  insertion,  dele¬ 
tion  and  substitution  transformations  and  special  string  distances. 
General  string  distances  are  further  classified  into  a  hierarchy  of  four 
levels.  Symmetric  property  of  string  distance  has  also  been  discussed. 
Recognition  can  be  carried  out  by  either  nearest-neighbor  decision  rule 
or  error-correcting  parsing.  We  use  a  modified  Earley’s  parsing  algo¬ 
rithm  which  does  not  require  an  expanded  grammar  and  is  able  to  use 
symmetric  distance. 


Chapter  III  demonstrates  the  experimental  results  of  seismic 
discrimination  and  damage  assesment.  If  shape  is  not  the  major 
feature,  pattern  segmentation  is  often  simpler.  We  only  need  to  con¬ 
sider  fixed-length  segmentation.  When  shape  information  is  the  dom¬ 
inant  feature,  pattern  segmentation  is  usually  associated  with  primitive 
recognition.  Generally  speaking,  a  fixed-length  segmentation  is  easier 
to  perform;  a  variable-length  segmentation  is  more  efficient  in 
representation.  However,  a  variable-length  segmentation  usually  takes 
more  time  in  determining  the  optimal  boundary.  Furthermore,  a 
variable-length  segmentation  sometimes  starts  from  fixed-length  seg¬ 
mentation  and  then  merges  or  splits  based  on  a  preset  criterion.  In 
general,  we  are  in  favor  of  fixed-length  segmentation  provided  a  proper 
length  can  be  easily  selected.  Feature  selection  is  problem-dependent; 
therefore  we  did  not  emphasize  on  this  subject.  Primitive  recognition 
is  our  first  major  topic  in  practical  applications.  Without  any  knowledge 
about  the  data,  we  use  a  clustering  procedure  to  find  the  optimal 
number  of  clusters.  Two  criteria,  increment  of  merge  distance  and 
pseudo  F-statistic  (PFS),  have  been  used  to  select  cluster  number  and 
they  show  identical  results.  Finite-state  grammars  are  inferred  from 
the  training  patterns  using  the  k-tail  inference  algorithm.  Unless  the 
patterns  are  really  generated  by  a  finite-state  grammar,  chosing  small 
values  of  k  usually  worsens  the  classification  result.  Our  experiments 
show  that  uneven  merge  of  states  makes  the  inferred  grammar  perform¬ 
ing  poorly  in  recognition. When'the  inferred  grammar  is  the  canonical 
grammar,  the  recognition  results  of  using  NN  rule  and  ECP  are  the 
same.  According  to  our  experiment,  the  NN  rule  takes  however  much 
less  computer  time  than  ECP.  A  modified  dynamic  time-warping 


system  has  been  used  to  measure  the  distance  between  the  seismic 
waveforms  of  the  building  during  a  strong  earthquake.  This  measure¬ 
ment  can  be  used  for  damage  assesment. 

Chapter  IV  introduces  an  attributed  grammar  and  parsing  for  sig¬ 
nal  recognition  in  general,  and  seismic  recognition  in  particular.  If  we 
use  a  canonical  grammar  as  the  pattern  grammar,  it  usually  contains  a 
large  number  of  production  rules  and  nonterminal  symbols.  Using 
attributes  will  increase  the  descriptive  power  of  the  grammar  as  well  as 
simplify  the  syntactic  rules  of  the  grammar.  We  use  a  length  attribute 
for  seismic  grammar,  which  reduces  more  than  90%  of  the  number  of 
productions  and  nonterminals  from  the  nonattributed  grammar.  Attri¬ 
buted  seismic  grammars  also  increase  the  recognition  speed  while 
maintaining  the  same  recognition  accuracy. 

Chapter  V  contains  VLSI  architectures  for  string  mat-Mng,  primi¬ 
tive  recognition  and  feature  extraction.  Although  some  special-purpose 
chips  have  been  developed  for  signal  recognition,  for  example,  spoken 
word  recognition,  we  are  making  our  systems  as  general  as  possible. 
This  is  to  say  our  string  matcher  and  primitive  recognizer  with  the 
exception  of  feature  extractor  can  be  applied  to  any  other  pattern 
recognition  problem.  They  employ  parallel  processing  and  pipeline 
data  flow  so  that  very  fast  throughput  can  be  achieved.  This  improve¬ 
ment  of  speed  makes  real-time  pattern  recognition  possible. 


6.2  Conclusions 


Syntactic  pattern  recognition  has  been  pointed  out  as  a  promising 
approach  to  seismic  classification  (Chen,  1978).  While  quite  a  few  sta¬ 
tistical  approaches  have  been  proposed,  we  are  the  first  to  apply  syn¬ 
tactic  approaches  to  this  problem.  With  two  simple  features,  our 
approaches  attain  better  results  (91%  correct  rate)  than  most  of  the 
existing  statistical  approaches  (Tjostheim,  1975;  Sarna  and  Stark, 
1980).  Our  approaches  also  differ  from  the  syntactic  methods  in 
Chapter  I  in  the  treatment  of  primitive  selection  and  grammar  con¬ 
struction.  A  clustering  procedure  along  with  two  decision  criteria  con¬ 
stitute  the  primitive  selection  algorithm  in  our  approach,  while  heuris¬ 
tic  approaches  were  used  by  others,  e.g.,  in  Stockman,  et  al.,  (1976). 
Our  pattern  grammars  are  inferred  from  training  samples,  bat  most 
pattern  grammars  for  signal  analysis  are  constructed  manually.  An 
attributed  grammar  for  the  seismic  application  is  proposed,  which 
could  significantly  reduce  the  grammar  size  and  increase  the  recogni¬ 
tion  speed.  Finally,  VLSI  architectures  are  proposed  for  seismic 
classification,  which  include  feature  extraction,  primitive  recognition 
and  string  matching  using  (weighted)  Levenshtein  distance.  Our  string 
matcher  is  different  from  many  contemparory  implementations,  i.e., 
exact  matching  (e.g.,  in  Foster  and  Kung,  1980),  which  are  not  suitable 
for  pattern  recognition  applications  because  of  the  noise  and  other 
problems,  for  example,  segmentation  and  primitive  recognition  errors; 
the  detail  is  discussed  in  chapter  V.  The  computationai  results  can  be 
produced  at  a  constant  rate,  i.e.,  constant  time  complexity,  when  using 
our  VLSI  architectures  with  pipelined  data  flow.  Although  these  VLSI 
systems  are  developed  tor  seismic  classification,  they  can  be  applied  to 
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other  similar  applications. 


6.3  Recommendations 

Future  works  about  syntactic  seismic  signal  recognition  can  be 
divided  into  two  parts,  one  is  algorithm  development,  the  other  is  high¬ 
speed  implementation.  (This  can  also  be  applied  to  other  signal  recog¬ 
nition  problems.)  In  algorithm  development,  the  possibility  of  using 
variable-length  segmentation  should  be  explored.  Stochastic  grammars 
and  parsing  should  be  applied  when  probabilistic  information  is  avail¬ 
able.  The  inclusion  of  semantic  information  in  pattern  primitive  is 
another  approach  (Tsai  and  Fu,  1980).  A  conventional  pattern 
representation  contains  only  syntactic  symbols.  A  typical  speech  pat¬ 
tern  for  dynamic  time  warping  contains  only  numerical  infomation.  A 
combination  of  these  two  will  have  both  syntactic  and  semantic  infor¬ 
mation.  The  distance  computation  and  parsing  of  such  patterns  can  be 
separated  into  syntactic  deformation  and  semantic  deformation,  and 
different  weights  can  be  assigned  to  these  two  deformations.  Feature 
extraction  also  needs  further  studies;  linear  predictive  coefficients  and 
features  from  power  spectrum  are  good  candidates. 

After  the  algorithms  are  developed,  they  can  often  be  implemented 
on  a  parallel  architecture,  particularly  on  VLSI  architectures.  In  our 
string  matcher,  a  global  path  constraint  can  be  imposed,  therefore 
reduce  the  number  of  processors.  Those  special-purpose  chips  can  be 
arranged  in  such  a  way  that  the  output  of  one  chip  is  used  directly  as 
the  input  of  another  chip.  Of  course,  this  can  happen  only  wher  all  the 
chips  have  the  same  processing  speed;  otherwise,  bufierc  or  lalchos  are 
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required  between  the  chips.  Although  these  chips  are  for  special  pur¬ 
poses,  flexibility  should  also  be  considered.  The  more  flexible  the  chips 
are,  the  more  applications  they  have;  therefore,  makes  their  manufac¬ 
turing  cheaper.  This  combination  of  algorithm  development  and  tech¬ 
nology  advance  will  make  many  pattern  recognition  applications  practi¬ 
cal  in  both  cost  and  speed. 

The  application  of  attributed  grammar  using  length  attribute  to 
speech  recognition  should  also  be  investigated.  Suppose  two  strings 
x  =aaaaaabbbccc  and  y  =  aaaabbcc  represent  different  utterances  of 
the  same  word.  If  we  use  string  matching  and  NNR  for  classification, 
then  d(x,y)  £  0  regardless  that  we  use  the  conventional  Levensh- 
tein  distance  or  weighted  Levenshtein  distance.  Ackroyd  (1980)  sug¬ 
gested  a  modified  WLD  which  is  computed  by  subtracting  \I— J  \dlD 
from  the  WLD,  where  7,  J  are  the  lengths  of  the  two  strings  respectively 
and  dlD  is  the  weigth  for  insertion  and  deletion.  Although  this  modified 
WLD  can  make  d(z,y)  =  0,  it  will  cause  other  problems,  for  example, 
d(y,z)  =  0  for  z-aaa.  The  type  3  WLD  proposed  in  Chapter  2  can  solve 
this  problem  by  letting  D( a, a)  =  /(a, a)  =  0  for  all  a  e  2.  However, 
there  exists  one  drawback,  i.e.,  there  is  no  restriction  on  the  number  of 
insertions  or  deletions.  An  attributed  grammar  using  length  attribute 
can  be  used  to  solve  this  problems  without  side  effects.  For  example,  if 
string  z  is  the  training  sample,  then  the  attributed  grammar  has  pro¬ 
duction  S-*ABC  with  inherited  length  attribute  ( {6,4} ,  [3,2j,  [3,20  for 
{A,B ,C).  This  attributed  grammar  will  accept  both  string  z  and  y,  but 
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APPENDIX  A 


FLOW  CHARTS  FOR  THE  SIMULATIONS 


Appendix  A  gives  the  flow  charts  for  the  simulations  in  Section  5.5. 
Figure  A.  1  is  the  flow  chart  for  feature  extraction,  Figure  A.2  is  the  flow 
chart  for  primitive  recognition,  and  Figure  A. 3  is  the  flow  chart  for 
string  matching. 
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Figure  A.l  Flow  chart  for  the  simulation  of  feature  extraction. 


APPENDIX  B 


STEP-BY-STEP  SIMULATION  RESULTS 

Table  B.l  shows  the  intermediate  results  of  feature  extraction  at 
each  time  interval  for  the  seismic  signal  shown  in  Figure  B.l.  The  sym¬ 
bols  a,b  ,S ,c  ,x ,y  and  d  are  described  in  Figure  5.4(b).  We  use  a 
linearly-connected  array  of  60  processors.  Therefore,  for  a  specific 
seismic  segment,  it  takes  60  unit  times  to  pass  through  the  processor 
array.  Since  the  data  can  be  pipelined,  it  takes  only  one  unit  time  to 
extract  the  feature  from  each  segment.  The  inputs  to  Table  B.2  are  the 
outputs  from  Table  B.l  after  normalization.  Table  B.2  shows  the  inter¬ 
mediate  results  of  primitive  recognition  at  each  time  interval.  At  time 

n  and  2n,  n  s  1,  2 .  13,  the  simulation  executes  ’compute'  operation. 

At  time  3 n,  n  =  1,  2,  ....  13,  the  simulation  executes  ’compere’  opera¬ 
tion.  The  symbols  a,x,u,b,y  and  v  of  ’compute’  operation  are 
described  in  Figure  5.7(a).  The  symbols  a,b,c,d,y  and  z  of  ’compare’ 
operation  are  described  in  Figure  5.7(b).  A  specific  feature  vector 
takes  39  unit  times  to  pass  through  the  processor  array.  Since  the 
feature  vectors  can  be  pipelined,  it  takes  two  unit  times  to  assign  a 
primitive  to  each  feature  vector.  The  output  'g'  from  Table  B.2  is  the 
4th.  symbol  of  the  second  string  in  Table  B.3.  Table  B.3  shows  the  inter¬ 
mediate  results  of  string  matching  using  the  weighted  Levenstein  dis¬ 
tance.  Since  only  substitution  errors  are  considered,  the  computation 
is  straightforward.  The  symbols  x  ,y  ,d  and  b  represent  the  rcpislcrs  as 
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described  in  Figure  5.16.  A  specific  pair  of  strings  take  39  unit  times  to 
pass  through  the  processor  array.  Since  the  strings  can  be  pipelined,  it 
takes  3  unit  times  to  compute  the  distance  between  an  unknown  and  a 
reference  string.  The  recognition  results  will  not  be  known  until  we 
compare  against  all  the  (100)  reference  strings. 
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TABLE  B.l 

The  intermediate  results  of  feature  extraction  at  each 
time  interval  for  the  seismic  segment  shown  in  Figure  B.l 
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im 


«• /. 


t  =  12  a  =  0.657193  b  =  0.502741  S  =  5.251374  c  =  1 

x  =  0.657193  y  =  5.251374  d  =  1 

t  =  13  a  =  0.811646  b  =  0.657193  S  =  6.063020  c=  1 

x  =  0.81 1646  y  =  6.063020  d  =  1 

t  =  14  a  =  0.657193  b  =  0.811646  S  =  6.720213  c=l 

x  =  0.657193  y  =  6.720213  d  =  1 

t  =  15  a  =  0.811646  b  =  0.657193  S  =  7.531859  c  =  1 

x  =  0.81 1646  y  =  7.531859  d  =  1 

t  =  16  a  =  0.811646  b  =  0.811646  S  =  8.343505  c=l 

x  =  0.811646  y  =  8.343505  d  =  l 

t  =  17  a  =  0.811646  b  =  0.811646  S  =  9.155150  c=l 

x=  0.811646  y  =  9.155150  d  =  1 

t  =  18  a=  0.811646  b  =  0.811646  S  =  9.966796  c=l 

x  =  0.81 1646  y  =  9.966796  d=l 

t  =  19  a  =  0.657193  b  =  0.811646  S  =  10.623989  c  =  1 

x  =  0.657193  y  =  10.623989  d  =  1 

t  =  20  a  =  0.657193  b  =  0.657193  S  =  11.281182  c  =  1 

x  =  0.657193  y=  11.281182  d  =  1 

t  =  21  a  =  0.657193  b  =  0.657193  S  =  11.938375  c  =  1 

x  =  0.657193  y  =  11.938375  d  =  1 

t  =  22  a  =  0.657193  b  =  0.657193  S  =  12.595569  c  =  1 

x  =  0.657193  y  =  12.595569  d  =  1 

t  =  23  a  =  0.657193  b  =  0.657193  S  =  13.252762  c  =  1 

x  =  0.657193  y  =  13.252762  d  =  1 

t  =  24  a  =  0.657193  b  =  0.657193  S  =  13.909955  c  =  1 

x  =  0.657193  y  =  13.909955  d  =  1 

t  =  25  a  =  0.811646  b  =  0.657193  S  =  14.721601  c=l 

x  =  0.81 1646  y  =  14.721601  d  =  l 

t  =  26  a  =  1.120550  b  =  0.611646  S  =  15.642150  c=l 
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x  =  1.120550 

y  =  15.842150 

d  =  1 

t  =  27 

a  =  1.583906 
x  =  1.583906 

b  =  1.120550 
y  s  17.426056 

S  =  17.426056 
d  =  1 

c  =  1 

t  =  28 

a  =  2.356166 
x  =  2.356166 

b  =  1.583906 
y  =  19.782223 

S  =  19.782223 
d  =  1 

c  =  1 

t  =  29 

a  =  3.282878 
x  =  3.282878 

b  =  2.356166 
y  =  23.065102 

S  =  23.065102 
d  =  1 

c  =  1 

t  =  30 

a  =  3.900687 
x  =  3.900687 

b  =  3.282878 
y  =  26.965788 

S  =  26.965788 
d  =  1 

c  =  1 

t  =  31 

a  =  3.437330 
x  =  3.437330 

b  =  3.900687 
y  =  30.403118 

S  =  30.403118 
d  =  1 

c  =  1 

t  =  32 

a  =  1.275002 
x  =  1.275002 

b  =  3.437330 
y  =  31.678120 

S  =  31.678120 
d  =  1 

c  =  1 

t  =  33 

a  =  -2.122946 
x  =  -2.122946 

b  =  1.275002 
y  =  33.801064 

S  =  33.801064 
d  =  2 

c  =  1 

t  =  34 

a  =  -5.984246 

X  =  -5.984246 

b  =  -2.122946 
y  =  39.785309 

S  =  39.705309 
d  =  2 

c  =  2 

t  =  35 

a  =  -8.918835 
x  =  -8.918835 

b  =  -5.984246 
y  =  48.704144 

S  =  48.704144 
d  =  2 

c  =  2 

t  =  36 

a  =  -10.000000 
x  =  -10.000000 

b  =  -8.918835 
y  =  58.704144 

S  =  58.704144 
d  =  2 

c  =  2 

t  =  37 

a  =  -9.227739 
x  =  -9.227739 

b  =  -10.000000 
y  =  67.931885 

S  =  67.931885 
d  =  2 

c  =  2 

t  =  38 

a  =  -6.447603 
x  =  -6.447603 

b  =  -9.227739 
y  =  74.379486 

S  =  74.379486 
d  =  2 

c  =  2 

t  =  39 

a  =  -2.122946 
x  =  -2.122946 

b  =  -6.447603 
y  =  76.502434 

S  =  76.502434 
d  =  2 

c  =  2 

t  =  40 

a  =  2.201714 
x  =  2.201714 

b  =  -2.122946 
y  s  70.704147 

S  =  78.704147 
d  =  3 

c  =  2 

201 


41  a  =  6.371919  b  =  2.201714  S  =  85.076065  c  =  3 

x  =  6.371919  y  =  85.076065  d  =  3 

42  a  =  8.997603  b  =  6.371919  S  =  94.073669  c  =  3 

x  =  8.997603  y  =  94.073669  d  =  3 

43  a  =  9.769864  b  =  8.997603  S  =  103.843536  c  =  3 

x  =  9.769864  y  =  103.843536  d  =  3 

44  a  =  9.306508  b  =  9.769864  S=  113.150047  c  =  3 

x  =  9.306508  y=  113.150047  d  =  3 

45  a  =  8.070891  b  =  9.306508  S  =  121.220940  c  =  3 

x  =  8.070891  y=  121.220940  d  =  3 

46  a  =  6.526371  b  =  P  070891*  S  =  127.747314  c  =  3 

x  =  6.526371  y  =  127.747314  d  =  3 

47  a  =  4.672946  b  =  6.526371  S  =  132.420258  c  =  3 

x  =  4.672946  y  =  132.420258  d  =  3 

48  a  =  2.356166  b  =  4.672946  S  =  134.776428  c  =  3 

x  =  2.356166  y  =  134.776428  d  =  3 


49 

a  =  -0.732875 

b  =  2.356166 

S 

=  135.509308 

o 

II 

u 

x  =  -0.732875 

y  =  135.509308 

d 

=  4 

50 

a  =  -3.821918 

b  =  -0.732875 

S 

=  139.331223 

c  =  4 

x  =  -3.821918 

y  =  139.331223 

d 

=  4 

51 

a  =  -6.910959 

b  =  -3.821918 

S 

=  146.242183 

c  =  4 

x  =  -6.910959 

y  =  146.242188 

d 

=  4 

52 

a  =  -8.764383 

b  =  -6.910959 

S 

=  155.006577 

c  =  4 

x  =  -8.764383 

y  =  155.006577 

d 

=  4 

53 

a  =  -8.918835 

b  =  -8.764383 

S 

=  163.925415 

ll 

o 

x  =  -8.918635 

y  =  163.925415 

d 

=  4 

54 

a  =  -8.146575 

b  =  -8.918835 

S 

=  172.071991 

o 

II 

x  =  -8.146575 

y  =  172.071991 

d 

=  4 
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t  =  55 

a  =  -7.065411 
x  =  -7.065411 

b  =  -8.146575 
y  =  179.137405 

S  =  179. 137405 
d  =  4 

o 

ll 

t  =  56 

a  =  -5.984246 
x  =  -5.984246 

b  =  -7.065411 
y  =  185.121658 

S  =  185.121653 
d  =  4 

o 

II 

t  = 

57 

a  =  -4.903082 

b  =  -5.984246 

S 

=  190.024734 

ll 

o 

x  =  -4.903082 

y  =  190.024734 

d 

=  4 

t  = 

58 

a  =  -3.667466 

b  =  -4.903082 

S 

=  193.692200 

II 

o 

x  =  -3.667466 

y  =  193.692200 

d 

=  4 

t  = 

59 

a  =  -1.814041 

b  =  -3.667466 

S 

=  195.506241 

c  =  4 

x  =  -1.814041 

y  =  195.506241 

d 

=  4 

t  = 

60 

a  =  0.193837 

b  =  -1.814041 

S 

=  195.700073 

o 

II 

x  =  0.193837 

y  =  195.700073 

d 

=  5 

i  ii  mtUmi mi 
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TABLE  B.2 

The  intermediate  results  of  primitive  recognition  at 
each  time  interval  for  the  feature  vector  from  Table  B.l 
after  normalization. 


t  =  1 

a  =  .000 

x  =  -.161 

u  =  -1.718 

b  =  2,426 

y  =  -.161 

v  =  -1.718 

t  =  2 

a  =  2.426 

x  =  19.252 

u  =  -  2.108 

b  =  458.711 

y  =  19.252 

v  =  -2.108 

t  =  3 

a  —  ********** 

b  =  458.711 

• 

II 

o 

y  =  458.711 

z  =  ’a’ 

t  =  4 

a  =  .000 

x  =  -.161 

u  =  3.337 

b  =  12,232 

y  =  -.161 

v  =  3.337 

t  =  5 

a  =  12.232 

x  =  19.252 

u  =  -1.740 

b  =  452.920 

y  =  19.252 

v  =  -1.740 

t  =  6 

a  =  458.711 

b  =  452.920 

o 

II 

£» 

y  =  452.920 

z  =  ’b’ 

t  =  7 

a  =  .000 

x  =  -.161 

u  =  -.180 

b  =  .000 

y  =  -.161 

v  =  -.180 

CO 

II 

■*> 

a  =  .000 

x  =  19.252 

u  =  -2.387 

b  =  468.287 

y  =  19.252 

v  =  -2.387 

t  =  9 

a  =  452.920 

b  =  468.287 

c  =  ’b’ 

y  -  452.920 

z  =  ’b’ 

t  =  10 

a  =  .000 

x  =  -.161 

u  =  -1.229 

b  =  1.142 

y  =  -.161 

v  =  -1.229 

t  =  11 

a  =  1.142  . 

x  =  19.252 

u  =  .987 

b  =  334.762 

y  =  19.252 

v  =  .987 

TABLE  B.3 


The  intermediate  results  of  string  matching 
at  each  time  interval  between  strings  ’ acag 
h.fijjmjfmmkTnjjjm'  and  'mklgifdifhffm 
kiUibb  '.  The  output  d  =  5.742  is  the  distance 
between  these  two  strings. 
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t  =  17 

x  =  ’j* 

y  =  T 

d  =  2.654 

t  =  10 

x  =  ’m* 

y  =  T 

b  =  2.654 

t  =  19 

x  =  'm* 

y  =  ’h’ 

d  =  2.924 

t  =  20 

x  =  ’j’ 

y  =  *h- 

b  =  2.924 

t  =  21 

x  =  ’j’ 

y  =  T 

d  =  3.113 

t  =  22 

x  =  r 

y  =  T 

b  =  3.113 

t  =  23 

x  =  r 

y  =  T 

d  =  3.113 

t  =  24 

x  =  'm* 

y  =  T 

b  =  3.113 

t  =  25 

x  =  ’m' 

y  =  ’m’ 

d  =  3.113 

t  =  26 

x  =  'm’ 

y  =  ’m’ 

b  =  3.113 

t  =  27 

x  =  'm' 

y  =  ’k’ 

d  =  3.306 

t  =  28 

x  =  ’k’ 

y  =  ’k' 

b  =  3.306 

t  =  29 

x  =  'k' 

y  =  ’i’ 

d  =  3.515 

t  =  30 

x  =  ’m' 

y  =  T 

b  =  3.515 

t  =  31 

x  =  ’m’ 

y  =  t 

d  =  3.882 

t  =  32 

x  =  ’j1 

y  =  t 

b  =  3.882 

t  =  33 

x  =  'j1 

y  =  T 

d  =  4.099 

t  =  34 

x  =  'j' 

y  =  T 

b  =  4.099 

t  =  35 

x  =  'j' 

y  =  ’i’ 

d  =  4.454 

t  =  36 

x  =  y 

y  =  ’i’ 

b  =  4.454 

t  =  3? 

X  =  ’j' 

y  =  ’b' 

d  =  5.173 

t  =  39 

x  =  'm' 

y  =  ’b‘ 

d  =  5.742 

